diff --git a/flang/documentation/ArrayComposition.md b/flang/docs/ArrayComposition.rst rename from flang/documentation/ArrayComposition.md rename to flang/docs/ArrayComposition.rst --- a/flang/documentation/ArrayComposition.md +++ b/flang/docs/ArrayComposition.rst @@ -1,11 +1,3 @@ - - This note attempts to describe the motivation for and design of an implementation of Fortran 90 (and later) array expression evaluation that minimizes the use of dynamically allocated temporary storage for @@ -15,38 +7,41 @@ The transformational intrinsic functions of Fortran of interest to us here include: -* Reductions to scalars (`SUM(X)`, also `ALL`, `ANY`, `COUNT`, - `DOT_PRODUCT`, - `IALL`, `IANY`, `IPARITY`, `MAXVAL`, `MINVAL`, `PARITY`, `PRODUCT`) -* Axial reductions (`SUM(X,DIM=)`, &c.) -* Location reductions to indices (`MAXLOC`, `MINLOC`, `FINDLOC`) -* Axial location reductions (`MAXLOC(DIM=`, &c.) -* `TRANSPOSE(M)` matrix transposition -* `RESHAPE` without `ORDER=` -* `RESHAPE` with `ORDER=` -* `CSHIFT` and `EOSHIFT` with scalar `SHIFT=` -* `CSHIFT` and `EOSHIFT` with array-valued `SHIFT=` -* `PACK` and `UNPACK` -* `MATMUL` -* `SPREAD` + +* Reductions to scalars (\ ``SUM(X)``\ , also ``ALL``\ , ``ANY``\ , ``COUNT``\ , + ``DOT_PRODUCT``\ , + ``IALL``\ , ``IANY``\ , ``IPARITY``\ , ``MAXVAL``\ , ``MINVAL``\ , ``PARITY``\ , ``PRODUCT``\ ) +* Axial reductions (\ ``SUM(X,DIM=)``\ , &c.) +* Location reductions to indices (\ ``MAXLOC``\ , ``MINLOC``\ , ``FINDLOC``\ ) +* Axial location reductions (\ ``MAXLOC(DIM=``\ , &c.) +* ``TRANSPOSE(M)`` matrix transposition +* ``RESHAPE`` without ``ORDER=`` +* ``RESHAPE`` with ``ORDER=`` +* ``CSHIFT`` and ``EOSHIFT`` with scalar ``SHIFT=`` +* ``CSHIFT`` and ``EOSHIFT`` with array-valued ``SHIFT=`` +* ``PACK`` and ``UNPACK`` +* ``MATMUL`` +* ``SPREAD`` Other Fortran intrinsic functions are technically transformational (e.g., -`COMMAND_ARGUMENT_COUNT`) but not of interest for this note. -The generic `REDUCE` is also not considered here. +``COMMAND_ARGUMENT_COUNT``\ ) but not of interest for this note. +The generic ``REDUCE`` is also not considered here. Arrays as functions =================== + A whole array can be viewed as a function that maps its indices to the values of its elements. Specifically, it is a map from a tuple of integers to its element type. The rank of the array is the number of elements in that tuple, and the shape of the array delimits the domain of the map. -`REAL :: A(N,M)` can be seen as a function mapping ordered pairs of integers -`(J,K)` with `1<=J<=N` and `1<=J<=M` to real values. +``REAL :: A(N,M)`` can be seen as a function mapping ordered pairs of integers +``(J,K)`` with ``1<=J<=N`` and ``1<=J<=M`` to real values. Array expressions as functions ============================== + The same perspective can be taken of an array expression comprising intrinsic operators and elemental functions. Fortran doesn't allow one to apply subscripts directly to an expression, @@ -54,28 +49,31 @@ as functions over index tuples by applying those indices to the arrays and subexpressions in the expression. -Consider `B = A + 1.0` (assuming `REAL :: A(N,M), B(N,M)`). +Consider ``B = A + 1.0`` (assuming ``REAL :: A(N,M), B(N,M)``\ ). The right-hand side of that assignment could be evaluated into a -temporary array `T` and then subscripted as it is copied into `B`. -``` -REAL, ALLOCATABLE :: T(:,:) -ALLOCATE(T(N,M)) -DO CONCURRENT(J=1:N,K=1:M) - T(J,K)=A(J,K) + 1.0 -END DO -DO CONCURRENT(J=1:N,K=1:M) - B(J,K)=T(J,K) -END DO -DEALLOCATE(T) -``` +temporary array ``T`` and then subscripted as it is copied into ``B``. + +.. code-block:: + + REAL, ALLOCATABLE :: T(:,:) + ALLOCATE(T(N,M)) + DO CONCURRENT(J=1:N,K=1:M) + T(J,K)=A(J,K) + 1.0 + END DO + DO CONCURRENT(J=1:N,K=1:M) + B(J,K)=T(J,K) + END DO + DEALLOCATE(T) + But we can avoid the allocation, population, and deallocation of the temporary by treating the right-hand side expression as if it -were a statement function `F(J,K)=A(J,K)+1.0` and evaluating -``` -DO CONCURRENT(J=1:N,K=1:M) - A(J,K)=F(J,K) -END DO -``` +were a statement function ``F(J,K)=A(J,K)+1.0`` and evaluating + +.. code-block:: + + DO CONCURRENT(J=1:N,K=1:M) + A(J,K)=F(J,K) + END DO In general, when a Fortran array assignment to a non-allocatable array does not include the left-hand @@ -85,50 +83,54 @@ Transformational intrinsic functions as function composition ============================================================ + Many of the transformational intrinsic functions listed above can, when their array arguments are viewed as functions over their index tuples, be seen as compositions of those functions with functions of the "incoming" indices -- yielding a function for an entire right-hand side of an array assignment statement. -For example, the application of `TRANSPOSE(A + 1.0)` to the index -tuple `(J,K)` becomes `A(K,J) + 1.0`. +For example, the application of ``TRANSPOSE(A + 1.0)`` to the index +tuple ``(J,K)`` becomes ``A(K,J) + 1.0``. Partial (axial) reductions can be similarly composed. -The application of `SUM(A,DIM=2)` to the index `J` is the -complete reduction `SUM(A(J,:))`. +The application of ``SUM(A,DIM=2)`` to the index ``J`` is the +complete reduction ``SUM(A(J,:))``. More completely: -* Reductions to scalars (`SUM(X)` without `DIM=`) become + + +* Reductions to scalars (\ ``SUM(X)`` without ``DIM=``\ ) become runtime calls; the result needs no dynamic allocation, being a scalar. -* Axial reductions (`SUM(X,DIM=d)`) applied to indices `(J,K)` - become scalar values like `SUM(X(J,K,:))` if `d=3`. -* Location reductions to indices (`MAXLOC(X)` without `DIM=`) +* Axial reductions (\ ``SUM(X,DIM=d)``\ ) applied to indices ``(J,K)`` + become scalar values like ``SUM(X(J,K,:))`` if ``d=3``. +* Location reductions to indices (\ ``MAXLOC(X)`` without ``DIM=``\ ) do not require dynamic allocation, since their results are - either scalar or small vectors of length `RANK(X)`. -* Axial location reductions (`MAXLOC(X,DIM=)`, &c.) - are handled like other axial reductions like `SUM(DIM=)`. -* `TRANSPOSE(M)` exchanges the two components of the index tuple. -* `RESHAPE(A,SHAPE=s)` without `ORDER=` must precompute the shape - vector `S`, and then use it to linearize indices into offsets - in the storage order of `A` (whose shape must also be captured). + either scalar or small vectors of length ``RANK(X)``. +* Axial location reductions (\ ``MAXLOC(X,DIM=)``\ , &c.) + are handled like other axial reductions like ``SUM(DIM=)``. +* ``TRANSPOSE(M)`` exchanges the two components of the index tuple. +* ``RESHAPE(A,SHAPE=s)`` without ``ORDER=`` must precompute the shape + vector ``S``\ , and then use it to linearize indices into offsets + in the storage order of ``A`` (whose shape must also be captured). These conversions can involve division and/or modulus, which can be optimized into a fixed-point multiplication using the usual technique. -* `RESHAPE` with `ORDER=` is similar, but must permute the - components of the index tuple; it generalizes `TRANSPOSE`. -* `CSHIFT` applies addition and modulus. -* `EOSHIFT` applies addition and a conditional move (`MERGE`). -* `PACK` and `UNPACK` are likely to require a runtime call. -* `MATMUL(A,B)` can become `DOT_PRODUCT(A(J,:),B(:,K))`, but +* ``RESHAPE`` with ``ORDER=`` is similar, but must permute the + components of the index tuple; it generalizes ``TRANSPOSE``. +* ``CSHIFT`` applies addition and modulus. +* ``EOSHIFT`` applies addition and a conditional move (\ ``MERGE``\ ). +* ``PACK`` and ``UNPACK`` are likely to require a runtime call. +* ``MATMUL(A,B)`` can become ``DOT_PRODUCT(A(J,:),B(:,K))``\ , but might benefit from calling a highly optimized runtime routine. -* `SPREAD(A,DIM=d,NCOPIES=n)` for compile-time `d` simply - applies `A` to a reduced index tuple. +* ``SPREAD(A,DIM=d,NCOPIES=n)`` for compile-time ``d`` simply + applies ``A`` to a reduced index tuple. Determination of rank and shape =============================== + An important part of evaluating array expressions without the use of temporary storage is determining the shape of the result prior to, or without, evaluating the elements of the result. @@ -138,22 +140,23 @@ But it is possible to determine the shapes of the results of many transformational intrinsic function calls as well. -* `SHAPE(SUM(X,DIM=d))` is `SHAPE(X)` with one element removed: - `PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)` in general. - (The `DIM=` argument is commonly a compile-time constant.) -* `SHAPE(MAXLOC(X))` is `[RANK(X)]`. -* `SHAPE(MAXLOC(X,DIM=d))` is `SHAPE(X)` with one element removed. -* `SHAPE(TRANSPOSE(M))` is a reversal of `SHAPE(M)`. -* `SHAPE(RESHAPE(..., SHAPE=S))` is `S`. -* `SHAPE(CSHIFT(X))` is `SHAPE(X)`; same with `EOSHIFT`. -* `SHAPE(PACK(A,VECTOR=V))` is `SHAPE(V)` -* `SHAPE(PACK(A,MASK=m))` with non-scalar `m` and without `VECTOR=` is `[COUNT(m)]`. -* `RANK(PACK(...))` is always 1. -* `SHAPE(UNPACK(MASK=M))` is `SHAPE(M)`. -* `SHAPE(MATMUL(A,B))` drops one value from `SHAPE(A)` and another from `SHAPE(B)`. -* `SHAPE(SHAPE(X))` is `[RANK(X)]`. -* `SHAPE(SPREAD(A,DIM=d,NCOPIES=n))` is `SHAPE(A)` with `n` inserted at - dimension `d`. + +* ``SHAPE(SUM(X,DIM=d))`` is ``SHAPE(X)`` with one element removed: + ``PACK(SHAPE(X),[(j,j=1,RANK(X))]/=d)`` in general. + (The ``DIM=`` argument is commonly a compile-time constant.) +* ``SHAPE(MAXLOC(X))`` is ``[RANK(X)]``. +* ``SHAPE(MAXLOC(X,DIM=d))`` is ``SHAPE(X)`` with one element removed. +* ``SHAPE(TRANSPOSE(M))`` is a reversal of ``SHAPE(M)``. +* ``SHAPE(RESHAPE(..., SHAPE=S))`` is ``S``. +* ``SHAPE(CSHIFT(X))`` is ``SHAPE(X)``\ ; same with ``EOSHIFT``. +* ``SHAPE(PACK(A,VECTOR=V))`` is ``SHAPE(V)`` +* ``SHAPE(PACK(A,MASK=m))`` with non-scalar ``m`` and without ``VECTOR=`` is ``[COUNT(m)]``. +* ``RANK(PACK(...))`` is always 1. +* ``SHAPE(UNPACK(MASK=M))`` is ``SHAPE(M)``. +* ``SHAPE(MATMUL(A,B))`` drops one value from ``SHAPE(A)`` and another from ``SHAPE(B)``. +* ``SHAPE(SHAPE(X))`` is ``[RANK(X)]``. +* ``SHAPE(SPREAD(A,DIM=d,NCOPIES=n))`` is ``SHAPE(A)`` with ``n`` inserted at + dimension ``d``. This is useful because expression evaluations that *do* require temporaries to hold their results (due to the context in which the evaluation occurs) @@ -163,7 +166,7 @@ intrinsic in the runtime library, can be designed with an API that includes a pointer to the destination array as an argument. -Statements like `ALLOCATE(A,SOURCE=expression)` should thus be capable +Statements like ``ALLOCATE(A,SOURCE=expression)`` should thus be capable of evaluating their array expressions directly into the newly-allocated storage for the allocatable array. The implementation would generate code to calculate the shape, use it @@ -175,6 +178,7 @@ Automatic reallocation of allocatables ====================================== + Fortran 2003 introduced the ability to assign non-conforming array expressions to ALLOCATABLE arrays with the implied semantics of reallocation to the new shape. @@ -184,25 +188,27 @@ Rewriting rules =============== -Let `{...}` denote an ordered tuple of 1-based indices, e.g. `{j,k}`, into + +Let ``{...}`` denote an ordered tuple of 1-based indices, e.g. ``{j,k}``\ , into the result of an array expression or subexpression. + * Array constructors always yield vectors; higher-rank arrays that appear as - constituents are flattened; so `[X] => RESHAPE(X,SHAPE=[SIZE(X)})`. + constituents are flattened; so ``[X] => RESHAPE(X,SHAPE=[SIZE(X)})``. * Array constructors with multiple constituents are concatenations of - their constituents; so `[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))`. + their constituents; so ``[X,Y]{j} => MERGE(Y{j-SIZE(X)},X{j},J>SIZE(X))``. * Array constructors with implied DO loops are difficult when nested triangularly. * Whole array references can have lower bounds other than 1, so - `A => A(LBOUND(A,1):UBOUND(A,1),...)`. -* Array sections simply apply indices: `A(i:...:n){j} => A(i1+n*(j-1))`. -* Vector-valued subscripts apply indices to the subscript: `A(N(:)){j} => A(N(:){j})`. -* Scalar operands ignore indices: `X{j,k} => X`. + ``A => A(LBOUND(A,1):UBOUND(A,1),...)``. +* Array sections simply apply indices: ``A(i:...:n){j} => A(i1+n*(j-1))``. +* Vector-valued subscripts apply indices to the subscript: ``A(N(:)){j} => A(N(:){j})``. +* Scalar operands ignore indices: ``X{j,k} => X``. Further, they are evaluated at most once. * Elemental operators and functions apply indices to their arguments: - `(A(:,:) + B(:,:)){j,k}` => A(:,:){j,k} + B(:,:){j,k}`. -* `TRANSPOSE(X){j,k} => X{k,j}`. -* `SPREAD(X,DIM=2,...){j,k} => X{j}`; i.e., the contents are replicated. + ``(A(:,:) + B(:,:)){j,k}`` => A(:,:){j,k} + B(:,:){j,k}`. +* ``TRANSPOSE(X){j,k} => X{k,j}``. +* ``SPREAD(X,DIM=2,...){j,k} => X{j}``\ ; i.e., the contents are replicated. If X is sufficiently expensive to compute elementally, it might be evaluated into a temporary. diff --git a/flang/docs/BijectiveInternalNameUniquing.rst b/flang/docs/BijectiveInternalNameUniquing.rst new file mode 100644 --- /dev/null +++ b/flang/docs/BijectiveInternalNameUniquing.rst @@ -0,0 +1,188 @@ + +Bijective Internal Name Uniquing +-------------------------------- + +FIR has a flat namespace. No two objects may have the same name at +the module level. (These would be functions, globals, etc.) +This necessitates some sort of encoding scheme to unique +symbols from the front-end into FIR. + +Another requirement is +to be able to reverse these unique names and recover the associated +symbol in the symbol table. + +Fortran is case insensitive, which allows the compiler to convert the +user's identifiers to all lower case. Such a universal conversion implies +that all upper case letters are available for use in uniquing. + +Prefix ``_Q`` +^^^^^^^^^^^^^^^^^ + +All uniqued names have the prefix sequence ``_Q`` to indicate the name has +been uniqued. (Q is chosen because it is a +`low frequency letter `_ +in English.) + +Scope Building +^^^^^^^^^^^^^^ + +Symbols can be scoped by the module, submodule, or procedure that contains +that symbol. After the ``_Q`` sigil, names are constructed from outermost to +innermost scope as + + +* Module name prefixed with ``M`` +* Submodule name prefixed with ``S`` +* Procedure name prefixed with ``F`` + +Given: + +.. code-block:: + + submodule (mod:s1mod) s2mod + ... + subroutine sub + ... + contains + function fun + +The uniqued name of ``fun`` becomes: + +.. code-block:: + + _QMmodSs1modSs2modFsubPfun + +Common blocks +^^^^^^^^^^^^^ + + +* A common block name will be prefixed with ``B`` + +Given: + +.. code-block:: + + common /variables/ i, j + +The uniqued name of ``variables`` becomes: + +.. code-block:: + + _QBvariables + +Given: + +.. code-block:: + + common i, j + +The uniqued name in case of ``blank common block`` becomes: + +.. code-block:: + + _QB + +Module scope global data +^^^^^^^^^^^^^^^^^^^^^^^^ + + +* A global data entity is prefixed with ``E`` +* A global entity that is constant (parameter) will be prefixed with ``EC`` + +Given: + +.. code-block:: + + module mod + integer :: intvar + real, parameter :: pi = 3.14 + end module + +The uniqued name of ``intvar`` becomes: + +.. code-block:: + + _QMmodEintvar + +The uniqued name of ``pi`` becomes: + +.. code-block:: + + _QMmodECpi + +Procedures/Subprograms +^^^^^^^^^^^^^^^^^^^^^^ + + +* A procedure/subprogram is prefixed with ``P`` + +Given: + +.. code-block:: + + subroutine sub + +The uniqued name of ``sub`` becomes: + +.. code-block:: + + _QPsub + +Derived types and related +^^^^^^^^^^^^^^^^^^^^^^^^^ + + +* A derived type is prefixed with ``T`` +* If a derived type has KIND parameters, they are listed in a consistent + canonical order where each takes the form ``Ki`` and where *i* is the + compile-time constant value. (All type parameters are integer.) If *i* + is a negative value, the prefix ``KN`` will be used and *i* will reflect + the magnitude of the value. + +Given: + +.. code-block:: + + module mymodule + type mytype + integer :: member + end type + ... + +The uniqued name of ``mytype`` becomes: + +.. code-block:: + + _QMmymoduleTmytype + +Given: + +.. code-block:: + + type yourtype(k1,k2) + integer, kind :: k1, k2 + real :: mem1 + complex :: mem2 + end type + +The uniqued name of ``yourtype`` where ``k1=4`` and ``k2=-6`` (at compile-time): + +.. code-block:: + + _QTyourtypeK4KN6 + + +* A derived type dispatch table is prefixed with ``D``. The dispatch table + for ``type t`` would be ``_QDTt`` +* A type descriptor instance is prefixed with ``C``. Intrinsic types can + be encoded with their names and kinds. The type descriptor for the + type ``yourtype`` above would be ``_QCTyourtypeK4KN6``. The type + descriptor for ``REAL(4)`` would be ``_QCrealK4``. + +Compiler generated names +^^^^^^^^^^^^^^^^^^^^^^^^ + +Compiler generated names do not have to be mapped back to Fortran. These +names will be prefixed with ``_QQ`` and followed by a unique compiler +generated identifier. There is, of course, no mapping back to a symbol +derived from the input source in this case as no such symbol exists. diff --git a/flang/documentation/C++17.md b/flang/docs/C++17.rst rename from flang/documentation/C++17.md rename to flang/docs/C++17.rst --- a/flang/documentation/C++17.md +++ b/flang/docs/C++17.rst @@ -1,12 +1,5 @@ - - -## C++14/17 features used in f18 +C++14/17 features used in f18 +----------------------------- The C++ dialect used in this project constitutes a subset of the standard C++ programming language and library features. @@ -23,16 +16,19 @@ We have chosen to use some features of the recent C++17 language standard in f18. The most important of these are: -* sum types (discriminated unions) in the form of `std::variant` -* `using` template parameter packs -* generic lambdas with `auto` argument types -* product types in the form of `std::tuple` -* `std::optional` -(`std::tuple` is actually a C++11 feature, but I include it + +* sum types (discriminated unions) in the form of ``std::variant`` +* ``using`` template parameter packs +* generic lambdas with ``auto`` argument types +* product types in the form of ``std::tuple`` +* ``std::optional`` + +(\ ``std::tuple`` is actually a C++11 feature, but I include it in this list because it's not particularly well known.) -### Sum types +Sum types +^^^^^^^^^ First, some background information to explain the need for sum types in f18. @@ -55,9 +51,9 @@ backtracking. It is constructed as the incremental composition of pure parsing functions that each, when given a context (location in the input stream plus some state), -either _succeeds_ or _fails_ to recognize some piece of Fortran. +either *succeeds* or *fails* to recognize some piece of Fortran. On success, they return a new state and some semantic value, and this is -usually an instance of a C++ `struct` type that encodes the semantic +usually an instance of a C++ ``struct`` type that encodes the semantic content of a production in the Fortran grammar. This technique allows us to specify both the Fortran grammar and the @@ -77,63 +73,67 @@ To represent nodes in the Fortran parse tree, we need a means of handling sum types for productions that have multiple alternatives. -The bounded polymorphism supplied by the C++17 `std::variant` fits +The bounded polymorphism supplied by the C++17 ``std::variant`` fits those needs exactly. For example, production R502 in Fortran defines the top-level program unit of Fortran as being a function, subroutine, module, &c. -The `struct ProgramUnit` in the f18 parse tree header file -represents each program unit with a member that is a `std::variant` +The ``struct ProgramUnit`` in the f18 parse tree header file +represents each program unit with a member that is a ``std::variant`` over the six possibilities. Similarly, the parser for that type in the f18 grammar has six alternatives, -each of which constructs an instance of `ProgramUnit` upon the result of -parsing a `Module`, `FunctionSubprogram`, and so on. +each of which constructs an instance of ``ProgramUnit`` upon the result of +parsing a ``Module``\ , ``FunctionSubprogram``\ , and so on. Code that performs semantic analysis on the result of a successful parse is typically implemented with overloaded functions. -A function instantiated on `ProgramUnit` will use `std::visit` to +A function instantiated on ``ProgramUnit`` will use ``std::visit`` to identify the right alternative and perform the right actions. -The call to `std::visit` must pass a visitor that can handle all +The call to ``std::visit`` must pass a visitor that can handle all of the possibilities, and f18 will fail to build if one is missing. -Were we unable to use `std::variant` directly, we would likely -have chosen to implement a local `SumType` replacement; in the -absence of C++17's abilities of `using` a template parameter pack -and allowing `auto` arguments in anonymous lambda functions, +Were we unable to use ``std::variant`` directly, we would likely +have chosen to implement a local ``SumType`` replacement; in the +absence of C++17's abilities of ``using`` a template parameter pack +and allowing ``auto`` arguments in anonymous lambda functions, it would be less convenient to use. The other options for polymorphism in C++ at the level of C++11 would be to: + + * loosen up compile-time type safety and use a unified parse tree node representation with an enumeration type for an operator and generic subtree pointers, or * define the sum types for the parse tree as abstract base classes from which each particular alternative would derive, and then use virtual - functions (or the forbidden `dynamic_cast`) to identify alternatives + functions (or the forbidden ``dynamic_cast``\ ) to identify alternatives during analysis -### Product types +Product types +^^^^^^^^^^^^^ Many productions in the Fortran grammar describe a sequence of various sub-parses. For example, R504 defines the things that may appear in the "specification -part" of a subprogram in the order in which they are allowed: `USE` -statements, then `IMPORT` statements, and so on. +part" of a subprogram in the order in which they are allowed: ``USE`` +statements, then ``IMPORT`` statements, and so on. The parse tree node that represents such a thing needs to incorporate the representations of those parses, of course. It turns out to be convenient to allow these data members to be anonymous -components of a `std::tuple` product type. +components of a ``std::tuple`` product type. This type facilitates the automation of code that walks over all of the members in a type-safe fashion and avoids the need to invent and remember -needless member names -- the components of a `std::tuple` instance can +needless member names -- the components of a ``std::tuple`` instance can be identified and accessed in terms of their types, and those tend to be distinct. -So we use `std::tuple` for such things. +So we use ``std::tuple`` for such things. It has also been handy for template metaprogramming that needs to work with lists of types. -### `std::optional` +``std::optional`` +^^^^^^^^^^^^^^^^^^^^^ This simple little type is used wherever a value might or might not be present. diff --git a/flang/docs/C++style.rst b/flang/docs/C++style.rst new file mode 100644 --- /dev/null +++ b/flang/docs/C++style.rst @@ -0,0 +1,428 @@ +In brief: +--------- + + +* Use *clang-format* + from llvm 7 + on all C++ source and header files before + every merge to master. All code layout should be determined + by means of clang-format. +* Where a clear precedent exists in the project, follow it. +* Otherwise, where `LLVM's C++ style guide `_ + is clear on usage, follow it. +* Otherwise, where a good public C++ style guide is relevant and clear, + follow it. `Google's `_ + is pretty good and comes with lots of justifications for its rules. +* Reasonable exceptions to these guidelines can be made. +* Be aware of some workarounds for known issues in older C++ compilers that should + still be able to compile f18. They are listed at the end of this document. + +In particular: +-------------- + +Use serial commas in comments, error messages, and documentation +unless they introduce ambiguity. + +Error messages +^^^^^^^^^^^^^^ + + +#. Messages should be a single sentence with few exceptions. +#. Fortran keywords should appear in upper case. +#. Names from the program appear in single quotes. +#. Messages should start with a capital letter. +#. Messages should not end with a period. + +Files +^^^^^ + + +#. File names should use dashes, not underscores. C++ sources have the + extension ".cpp", not ".C" or ".cc" or ".cxx". Don't create needless + source directory hierarchies. +#. Header files should be idempotent. Use the usual technique: + .. code-block:: + + #ifndef FORTRAN_header_H_ + #define FORTRAN_header_H_ + // code + #endif // FORTRAN_header_H_ + +#. ``#include`` every header defining an entity that your project header or source + file actually uses directly. (Exception: when foo.cpp starts, as it should, + with ``#include "foo.h"``\ , and foo.h includes bar.h in order to define the + interface to the module foo, you don't have to redundantly ``#include "bar.h"`` + in foo.cpp.) +#. In the source file "foo.cpp", put its corresponding ``#include "foo.h"`` + first in the sequence of inclusions. + Then ``#include`` other project headers in alphabetic order; then C++ standard + headers, also alphabetically; then C and system headers. +#. Don't use ``#include ``. If you need it for temporary debugging, + remove the inclusion before committing. + +Naming +^^^^^^ + + +#. C++ names that correspond to well-known interfaces from the STL, LLVM, + and Fortran standard + can and should look like their models when the reader can safely assume that + they mean the same thing -- e.g., ``clear()`` and ``size()`` member functions + in a class that implements an STL-ish container. + Fortran intrinsic function names are conventionally in ALL CAPS. +#. Non-public data members should be named with leading miniscule (lower-case) + letters, internal camelCase capitalization, and a trailing underscore, + e.g. ``DoubleEntryBookkeepingSystem myLedger_;``. POD structures with + only public data members shouldn't use trailing underscores, since they + don't have class functions from which data members need to be distinguishable. +#. Accessor member functions are named with the non-public data member's name, + less the trailing underscore. Mutator member functions are named ``set_...`` + and should return ``*this``. Don't define accessors or mutators needlessly. +#. Other class functions should be named with leading capital letters, + CamelCase, and no underscores, and, like all functions, should be based + on imperative verbs, e.g. ``HaltAndCatchFire()``. +#. It is fine to use short names for local variables with limited scopes, + especially when you can declare them directly in a ``for()``\ /\ ``while()``\ /\ ``if()`` + condition. Otherwise, prefer complete English words to abbreviations + when creating names. + +Commentary +^^^^^^^^^^ + + +#. Use ``//`` for all comments except for short ``/*notes*/`` within expressions. +#. When ``//`` follows code on a line, precede it with two spaces. +#. Comments should matter. Assume that the reader knows current C++ at least as + well as you do and avoid distracting her by calling out usage of new + features in comments. + +Layout +^^^^^^ + +Always run ``clang-format`` on your changes before committing code. LLVM +has a ``git-clang-format`` script to facilitate running clang-format only +on the lines that have changed. + +Here's what you can expect to see ``clang-format`` do: + + +#. Indent with two spaces. +#. Don't indent public:, protected:, and private: + accessibility labels. +#. Never use more than 80 characters per source line. +#. Don't use tabs. +#. Don't indent the bodies of namespaces, even when nested. +#. Function result types go on the same line as the function and argument + names. + +Don't try to make columns of variable names or comments +align vertically -- they are maintenance problems. + +Always wrap the bodies of ``if()``\ , ``else``\ , ``while()``\ , ``for()``\ , ``do``\ , &c. +with braces, even when the body is a single statement or empty. The +opening ``{`` goes on +the end of the line, not on the next line. Functions also put the opening +``{`` after the formal arguments or new-style result type, not on the next +line. Use ``{}`` for empty inline constructors and destructors in classes. + +If any branch of an ``if``\ /\ ``else if``\ /\ ``else`` cascade ends with a return statement, +they all should, with the understanding that the cases are all unexceptional. +When testing for an error case that should cause an early return, do so with +an ``if`` that doesn't have a following ``else``. + +Don't waste space on the screen with needless blank lines or elaborate block +commentary (lines of dashes, boxes of asterisks, &c.). Write code so as to be +easily read and understood with a minimum of scrolling. + +Avoid using assignments in controlling expressions of ``if()`` &c., even with +the idiom of wrapping them with extra parentheses. + +In multi-element initializer lists (especially ``common::visitors{...}``\ ), +including a comma after the last element often causes ``clang-format`` to do +a better jobs of formatting. + +C++ language +^^^^^^^^^^^^ + +Use *C++17*\ , unless some compiler to which we must be portable lacks a feature +you are considering. +However: + + +#. Never throw or catch exceptions. +#. Never use run-time type information or ``dynamic_cast<>``. +#. Never declare static data that executes a constructor. + (This is why ``#include `` is contraindicated.) +#. Use ``{braced initializers}`` in all circumstances where they work, including + default data member initialization. They inhibit implicit truncation. + Don't use ``= expr`` initialization just to effect implicit truncation; + prefer an explicit ``static_cast<>``. + With C++17, braced initializers work fine with ``auto`` too. + Sometimes, however, there are better alternatives to empty braces; + e.g., prefer ``return std::nullopt;`` to ``return {};`` to make it more clear + that the function's result type is a ``std::optional<>``. +#. Avoid unsigned types apart from ``size_t``\ , which must be used with care. + When ``int`` just obviously works, just use ``int``. When you need something + bigger than ``int``\ , use ``std::int64_t`` rather than ``long`` or ``long long``. +#. Use namespaces to avoid conflicts with client code. Use one top-level + ``Fortran`` project namespace. Don't introduce needless nested namespaces within the + project when names don't conflict or better solutions exist. Never use + ``using namespace ...;`` outside test code; never use ``using namespace std;`` + anywhere. Access STL entities with names like ``std::unique_ptr<>``\ , + without a leading ``::``. +#. Prefer ``static`` functions over functions in anonymous namespaces in source files. +#. Use ``auto`` judiciously. When the type of a local variable is known, + monomorphic, and easy to type, be explicit rather than using ``auto``. + Don't use ``auto`` functions unless the type of the result of an outlined member + function definition can be more clear due to its use of types declared in the + class. +#. Use move semantics and smart pointers to make dynamic memory ownership + clear. Consider reworking any code that uses ``malloc()`` or a (non-placement) + ``operator new``. + See the section on Pointers below for some suggested options. +#. When defining argument types, use values when object semantics are + not required and the value is small and copyable without allocation + (e.g., ``int``\ ); + use ``const`` or rvalue references for larger values (e.g., ``std::string``\ ); + use ``const`` references to rather than pointers to immutable objects; + and use non-\ ``const`` references for mutable objects, including "output" arguments + when they can't be function results. + Put such output arguments last (\ *pace* the standard C library conventions for ``memcpy()`` & al.). +#. Prefer ``typename`` to ``class`` in template argument declarations. +#. Prefer ``enum class`` to plain ``enum`` wherever ``enum class`` will work. + We have an ``ENUM_CLASS`` macro that helps capture the names of constants. +#. Use ``constexpr`` and ``const`` generously. +#. When a ``switch()`` statement's labels do not cover all possible case values + explicitly, it should contain either a ``default:;`` at its end or a + ``default:`` label that obviously crashes; we have a ``CRASH_NO_CASE`` macro + for such situations. +#. On the other hand, when a ``switch()`` statement really does cover all of + the values of an ``enum class``\ , please insert a call to the ``SWITCH_COVERS_ALL_CASES`` + macro at the top of the block. This macro does the right thing for G++ and + clang to ensure that no warning is emitted when the cases are indeed all covered. +#. When using ``std::optional`` values, avoid unprotected access to their content. + This is usually by means of ``x.has_value()`` guarding execution of ``*x``. + This is implicit when they are function results assigned to local variables + in ``if``\ /\ ``while`` predicates. + When no presence test is obviously protecting a ``*x`` reference to the + contents, and it is assumed that the contents are present, validate that + assumption by using ``x.value()`` instead. +#. We use ``c_str()`` rather than ``data()`` when converting a ``std::string`` + to a ``const char *`` when the result is expected to be NUL-terminated. +#. Avoid explicit comparisions of pointers to ``nullptr`` and tests of + presence of ``optional<>`` values with ``.has_value()`` in the predicate + expressions of control flow statements, but prefer them to implicit + conversions to ``bool`` when initializing ``bool`` variables and arguments, + and to the use of the idiom ``!!``. + +Classes +~~~~~~~ + + +#. Define POD structures with ``struct``. +#. Don't use ``this->`` in (non-static) member functions, unless forced to + do so in a template member function. +#. Define accessor and mutator member functions (implicitly) inline in the + class, after constructors and assignments. Don't needlessly define + (implicit) inline member functions in classes unless they really solve a + performance problem. +#. Try to make class definitions in headers concise specifications of + interfaces, at least to the extent that C++ allows. +#. When copy constructors and copy assignment are not necessary, + and move constructors/assignment is present, don't declare them and they + will be implicitly deleted. When neither copy nor move constructors + or assignments should exist for a class, explicitly ``=delete`` all of them. +#. Make single-argument constructors (other than copy and move constructors) + 'explicit' unless you really want to define an implicit conversion. + +Pointers +~~~~~~~~ + +There are many -- perhaps too many -- means of indirect addressing +data in this project. +Some of these are standard C++ language and library features, +while others are local inventions in ``lib/Common``\ : + + +* Bare pointers (\ ``Foo *p``\ ): these are obviously nullable, non-owning, + undefined when uninitialized, shallowly copyable, reassignable, and often + not the right abstraction to use in this project. + But they can be the right choice to represent an optional + non-owning reference, as in a function result. + Use the ``DEREF()`` macro to convert a pointer to a reference that isn't + already protected by an explicit test for null. +* References (\ ``Foo &r``\ , ``const Foo &r``\ ): non-nullable, not owning, + shallowly copyable, and not reassignable. + References are great for invisible indirection to objects whose lifetimes are + broader than that of the reference. + Take care when initializing a reference with another reference to ensure + that a copy is not made because only one of the references is ``const``\ ; + this is a pernicious C++ language pitfall! +* Rvalue references (\ ``Foo &&r``\ ): These are non-nullable references + *with* ownership, and they are ubiquitously used for formal arguments + wherever appropriate. +* ``std::reference_wrapper<>``\ : non-nullable, not owning, shallowly + copyable, and (unlike bare references) reassignable, so suitable for + use in STL containers and for data members in classes that need to be + copyable or assignable. +* `common::Reference<>`: like `std::reference_wrapper<>\ ``, but also supports + move semantics, member access, and comparison for equality; suitable for use in``\ std::variant<>`. +* ``std::unique_ptr<>``\ : A nullable pointer with ownership, null by default, + not copyable, reassignable. + F18 has a helpful ``Deleter<>`` class template that makes ``unique_ptr<>`` + easier to use with forward-referenced data types. +* ``std::shared_ptr<>``\ : A nullable pointer with shared ownership via reference + counting, null by default, shallowly copyable, reassignable, and slow. +* ``Indirection<>``\ : A non-nullable pointer with ownership and + optional deep copy semantics; reassignable. + Often better than a reference (due to ownership) or ``std::unique_ptr<>`` + (due to non-nullability and copyability). + Can be wrapped in ``std::optional<>`` when nullability is required. + Usable with forward-referenced data types with some use of ``extern template`` + in headers and explicit template instantiation in source files. +* ``CountedReference<>``\ : A nullable pointer with shared ownership via + reference counting, null by default, shallowly copyable, reassignable. + Safe to use *only* when the data are private to just one + thread of execution. + Used sparingly in place of ``std::shared_ptr<>`` only when the overhead + of that standard feature is prohibitive. + +A feature matrix: + +.. list-table:: + :header-rows: 1 + + * - indirection + - nullable + - default null + - owning + - reassignable + - copyable + - undefined type ok? + * - ``*p`` + - yes + - no + - no + - yes + - shallowly + - yes + * - ``&r`` + - no + - n/a + - no + - no + - shallowly + - yes + * - ``&&r`` + - no + - n/a + - yes + - no + - shallowly + - yes + * - ``reference_wrapper<>`` + - no + - n/a + - no + - yes + - shallowly + - yes + * - ``Reference<>`` + - no + - n/a + - no + - yes + - shallowly + - yes + * - ``unique_ptr<>`` + - yes + - yes + - yes + - yes + - no + - yes, with work + * - ``shared_ptr<>`` + - yes + - yes + - yes + - yes + - shallowly + - no + * - ``Indirection<>`` + - no + - n/a + - yes + - yes + - optionally deeply + - yes, with work + * - ``CountedReference<>`` + - yes + - yes + - yes + - yes + - shallowly + - no + + +Overall design preferences +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Don't use dynamic solutions to solve problems that can be solved at +build time; don't solve build time problems by writing programs that +produce source code when macros and templates suffice; don't write macros +when templates suffice. Templates are statically typed, checked by the +compiler, and are (or should be) visible to debuggers. + +Exceptions to these guidelines +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Reasonable exceptions will be allowed; these guidelines cannot anticipate +all situations. +For example, names that come from other sources might be more clear if +their original spellings are preserved rather than mangled to conform +needlessly to the conventions here, as Google's C++ style guide does +in a way that leads to weirdly capitalized abbreviations in names +like ``Http``. +Consistency is one of many aspects in the pursuit of clarity, +but not an end in itself. + +C++ compiler bug workarounds +---------------------------- + +Below is a list of workarounds for C++ compiler bugs met with f18 that, even +if the bugs are fixed in latest C++ compiler versions, need to be applied so +that all desired tool-chains can compile f18. + +Explicitly move noncopyable local variable into optional results +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following code is legal C++ but fails to compile with the +default Ubuntu 18.04 g++ compiler (7.4.0-1ubuntu1~18.0.4.1): + +.. code-block:: + + class CantBeCopied { + public: + CantBeCopied(const CantBeCopied&) = delete; + CantBeCopied(CantBeCopied&&) = default; + CantBeCopied() {} + }; + std::optional fooNOK() { + CantBeCopied result; + return result; // Legal C++, but does not compile with Ubuntu 18.04 default g++ + } + std::optional fooOK() { + CantBeCopied result; + return {std::move(result)}; // Compiles OK everywhere + } + +The underlying bug is actually not specific to ``std::optional`` but this is the most common +case in f18 where the issue may occur. The actual bug can be reproduced with any class ``B`` +that has a perfect forwarding constructor taking ``CantBeCopied`` as argument: +``template B(CantBeCopied&& x) x_{std::forward(x)} {}``. +In such scenarios, Ubuntu 18.04 g++ fails to instantiate the move constructor +and to construct the returned value as it should, instead it complains about a +missing copy constructor. + +Local result variables do not need to and should not be explicitly moved into optionals +if they have a copy constructor. diff --git a/flang/documentation/Calls.md b/flang/docs/Calls.rst rename from flang/documentation/Calls.md rename to flang/docs/Calls.rst --- a/flang/documentation/Calls.md +++ b/flang/docs/Calls.rst @@ -1,12 +1,5 @@ - - -## Procedure reference implementation protocol +Procedure reference implementation protocol +------------------------------------------- Fortran function and subroutine references are complicated. This document attempts to collect the requirements imposed by the 2018 @@ -20,7 +13,9 @@ This note does not consider calls to intrinsic procedures, statement functions, or calls to internal runtime support library routines. -## Quick review of terminology +Quick review of terminology +--------------------------- + * A *dummy argument* is a function or subroutine parameter. It is *associated* with an *effective argument* at each call @@ -33,29 +28,30 @@ * An *explicit-shape* array has all of its bounds specified; lower bounds default to 1. These can be passed by with a single address and their contents are contiguous. -* An *assumed-size* array is an explicit-shape array with `*` as its +* An *assumed-size* array is an explicit-shape array with ``*`` as its final dimension, which is the most-significant one in Fortran and whose value does not affect indexed address calculations. -* A *deferred-shape* array (`DIMENSION::A(:)`) is a `POINTER` or `ALLOCATABLE`. - `POINTER` target data might not be contiguous. -* An *assumed-shape* (not size!) array (`DIMENSION::A(:)`) is a dummy argument - that is neither `POINTER` nor `ALLOCATABLE`; its lower bounds can be set +* A *deferred-shape* array (\ ``DIMENSION::A(:)``\ ) is a ``POINTER`` or ``ALLOCATABLE``. + ``POINTER`` target data might not be contiguous. +* An *assumed-shape* (not size!) array (\ ``DIMENSION::A(:)``\ ) is a dummy argument + that is neither ``POINTER`` nor ``ALLOCATABLE``\ ; its lower bounds can be set by the procedure that receives them (defaulting to 1), and its upper bounds are functions of the lower bounds and the extents of dimensions in the *shape* of the effective argument. -* An *assumed-length* `CHARACTER(*)` dummy argument +* An *assumed-length* ``CHARACTER(*)`` dummy argument takes its length from the effective argument. -* An *assumed-length* `CHARACTER(*)` *result* of an external function (C721) +* An *assumed-length* ``CHARACTER(*)`` *result* of an external function (C721) has its length determined by its eventual declaration in a calling scope. -* An *assumed-rank* `DIMENSION::A(..)` dummy argument array has an unknown +* An *assumed-rank* ``DIMENSION::A(..)`` dummy argument array has an unknown number of dimensions. -* A *polymorphic* `CLASS(t)` dummy argument, `ALLOCATABLE`, or `POINTER` +* A *polymorphic* ``CLASS(t)`` dummy argument, ``ALLOCATABLE``\ , or ``POINTER`` has a specific derived type or some extension of that type. - An *unlimited polymorphic* `CLASS(*)` object can have any + An *unlimited polymorphic* ``CLASS(*)`` object can have any intrinsic or derived type. -* *Interoperable* `BIND(C)` procedures are written in C or callable from C. +* *Interoperable* ``BIND(C)`` procedures are written in C or callable from C. -## Interfaces +Interfaces +---------- Referenced procedures may or may not have declared interfaces available to their call sites. @@ -63,31 +59,32 @@ Procedures with some post-Fortran '77 features *require* an explicit interface to be called (15.4.2.2) or even passed (4.3.4(5)): + * use of argument keywords in a call -* procedures that are `ELEMENTAL` or `BIND(C)` -* procedures that are required to be `PURE` due to the context of the call - (specification expression, `DO CONCURRENT`, `FORALL`) -* dummy arguments with these attributes: `ALLOCATABLE`, `POINTER`, - `VALUE`, `TARGET`, `OPTIONAL`, `ASYNCHRONOUS`, `VOLATILE`, - and, as a consequence of limitations on its use, `CONTIGUOUS`; - `INTENT()`, however, does *not* require an explicit interface +* procedures that are ``ELEMENTAL`` or ``BIND(C)`` +* procedures that are required to be ``PURE`` due to the context of the call + (specification expression, ``DO CONCURRENT``\ , ``FORALL``\ ) +* dummy arguments with these attributes: ``ALLOCATABLE``\ , ``POINTER``\ , + ``VALUE``\ , ``TARGET``\ , ``OPTIONAL``\ , ``ASYNCHRONOUS``\ , ``VOLATILE``\ , + and, as a consequence of limitations on its use, ``CONTIGUOUS``\ ; + ``INTENT()``\ , however, does *not* require an explicit interface * dummy arguments that are coarrays * dummy arguments that are assumed-shape or assumed-rank arrays * dummy arguments with parameterized derived types * dummy arguments that are polymorphic * function result that is an array -* function result that is `ALLOCATABLE` or `POINTER` -* `CHARACTER` function result whose length is neither constant +* function result that is ``ALLOCATABLE`` or ``POINTER`` +* ``CHARACTER`` function result whose length is neither constant nor assumed -* derived type function result with `LEN` type parameter value that is +* derived type function result with ``LEN`` type parameter value that is not constant (note that result derived type parameters cannot be assumed (C795)) Module procedures, internal procedures, procedure pointers, type-bound procedures, and recursive references by a procedure to itself always have explicit interfaces. -(Consequently, they cannot be assumed-length `CHARACTER(*)` functions; -conveniently, assumed-length `CHARACTER(*)` functions are prohibited from +(Consequently, they cannot be assumed-length ``CHARACTER(*)`` functions; +conveniently, assumed-length ``CHARACTER(*)`` functions are prohibited from recursion (15.6.2.1(3))). Other uses of procedures besides calls may also require explicit interfaces, @@ -96,21 +93,22 @@ Note that non-parameterized monomorphic derived type arguments do *not* by themselves require the use of an explicit interface. However, dummy arguments with any derived type parameters *do* -require an explicit interface, even if they are all `KIND` type +require an explicit interface, even if they are all ``KIND`` type parameters. -15.5.2.9(2) explicitly allows an assumed-length `CHARACTER(*)` function +15.5.2.9(2) explicitly allows an assumed-length ``CHARACTER(*)`` function to be passed as an actual argument to an explicit-length dummy; this has implications for calls to character-valued dummy functions and function pointers. -(In the scopes that reference `CHARACTER` functions, they must have +(In the scopes that reference ``CHARACTER`` functions, they must have visible definitions with explicit result lengths.) -### Implicit interfaces +Implicit interfaces +^^^^^^^^^^^^^^^^^^^ In the absence of any characteristic or context that *requires* an explicit interface (see above), an external function or subroutine (R503) -or `ENTRY` (R1541) can be called directly or indirectly via its implicit interface. +or ``ENTRY`` (R1541) can be called directly or indirectly via its implicit interface. Each of the arguments can be passed as a simple address, including dummy procedures. Procedures that *can* be called via an implicit interface can @@ -133,12 +131,12 @@ Note that F77ish functions still have known result types, possibly by means of implicit typing of their names. -They can also be `CHARACTER(*)` assumed-length character functions. +They can also be ``CHARACTER(*)`` assumed-length character functions. In other words: these F77sh procedures that do not require the use of an explicit interface and that can possibly be referenced, directly or indirectly, with implicit interfaces are limited to argument lists that comprise -only the addresses of effective arguments and the length of a `CHARACTER` function result +only the addresses of effective arguments and the length of a ``CHARACTER`` function result (when there is one), and they can return only scalar values with constant type parameter values. None of their arguments or results need be (or can be) implemented @@ -147,13 +145,14 @@ simple addresses of non-internal subprograms or trampolines for internal procedures. -Note that the `INTENT` attribute does not, by itself, +Note that the ``INTENT`` attribute does not, by itself, require the use of explicit interface; neither does the use of a dummy procedure (implicit or explicit in their interfaces). So the analyis of calls to F77ish procedures must allow for the -invisible use of `INTENT(OUT)`. +invisible use of ``INTENT(OUT)``. -## Protocol overview +Protocol overview +----------------- Here is a summary script of all of the actions that may need to be taken by the calling procedure and its referenced procedure to effect @@ -162,90 +161,103 @@ The order of these steps is not particularly strict, and we have some design alternatives that are explored further below. -### Before the call: +Before the call: +^^^^^^^^^^^^^^^^ + -1. Compute &/or copy into temporary storage the values of +#. Compute &/or copy into temporary storage the values of some effective argument expressions and designators (see below). -1. Create and populate descriptors for arguments that use them +#. Create and populate descriptors for arguments that use them (see below). -1. Possibly allocate function result storage, +#. Possibly allocate function result storage, when its size can be known by all callers; function results that are - neither `POINTER` nor `ALLOCATABLE` must have explicit shapes (C816). -1. Create and populate a descriptor for the function result, if it - needs one (deferred-shape/-length `POINTER`, any `ALLOCATABLE`, + neither ``POINTER`` nor ``ALLOCATABLE`` must have explicit shapes (C816). +#. Create and populate a descriptor for the function result, if it + needs one (deferred-shape/-length ``POINTER``\ , any ``ALLOCATABLE``\ , derived type with non-constant length parameters, &c.). -1. Capture the values of host-escaping local objects in memory; +#. Capture the values of host-escaping local objects in memory; package them into single address (for calls to internal procedures & for calls that pass internal procedures as arguments). -1. Resolve the target procedure's polymorphic binding, if any. -1. Marshal effective argument addresses (or values for `%VAL()` and some - discretionary `VALUE` arguments) into registers. -1. Marshal `CHARACTER` argument lengths in additional value arguments for - `CHARACTER` effective arguments not passed via descriptors. +#. Resolve the target procedure's polymorphic binding, if any. +#. Marshal effective argument addresses (or values for ``%VAL()`` and some + discretionary ``VALUE`` arguments) into registers. +#. Marshal ``CHARACTER`` argument lengths in additional value arguments for + ``CHARACTER`` effective arguments not passed via descriptors. These lengths must be 64-bit integers. -1. Marshal an extra argument for the length of a `CHARACTER` function +#. Marshal an extra argument for the length of a ``CHARACTER`` function result if the function is F77ish. -1. Marshal an extra argument for the function result's descriptor, +#. Marshal an extra argument for the function result's descriptor, if it needs one. -1. Set the "host instance" (static link) register when calling an internal +#. Set the "host instance" (static link) register when calling an internal procedure from its host or another internal procedure, a procedure pointer, or dummy procedure (when it has a descriptor). -1. Jump. +#. Jump. -### On entry: -1. For subprograms with alternate `ENTRY` points: shuffle `ENTRY` dummy arguments +On entry: +^^^^^^^^^ + + +#. For subprograms with alternate ``ENTRY`` points: shuffle ``ENTRY`` dummy arguments set a compiler-generated variable to identify the alternate entry point, - and jump to the common entry point for common processing and a `switch()` - to the statement after the `ENTRY`. -1. Capture `CHARACTER` argument &/or assumed-length result length values. -1. Complete `VALUE` copying if this step will not always be done + and jump to the common entry point for common processing and a ``switch()`` + to the statement after the ``ENTRY``. +#. Capture ``CHARACTER`` argument &/or assumed-length result length values. +#. Complete ``VALUE`` copying if this step will not always be done by the caller (as I think it should be). -1. Finalize &/or re-initialize `INTENT(OUT)` non-pointer +#. Finalize &/or re-initialize ``INTENT(OUT)`` non-pointer effective arguments (see below). -1. For interoperable procedures called from C: compact discontiguous - dummy argument values when necessary (`CONTIGUOUS` &/or - explicit-shape/assumed-size arrays of assumed-length `CHARACTER(*)`). -1. Optionally compact assumed-shape arguments for contiguity on one +#. For interoperable procedures called from C: compact discontiguous + dummy argument values when necessary (\ ``CONTIGUOUS`` &/or + explicit-shape/assumed-size arrays of assumed-length ``CHARACTER(*)``\ ). +#. Optionally compact assumed-shape arguments for contiguity on one or more leading dimensions to improve SIMD vectorization, if not - `TARGET` and not already sufficiently contiguous. + ``TARGET`` and not already sufficiently contiguous. (PGI does this in the caller, whether the callee needs it or not.) -1. Complete allocation of function result storage, if that has +#. Complete allocation of function result storage, if that has not been done by the caller. -1. Initialize components of derived type local variables, +#. Initialize components of derived type local variables, including the function result. Execute the callee, populating the function result or selecting the subroutine's alternate return. -### On exit: -1. Clean up local scope (finalization, deallocation) -1. Deallocate `VALUE` argument temporaries. +On exit: +^^^^^^^^ + + +#. Clean up local scope (finalization, deallocation) +#. Deallocate ``VALUE`` argument temporaries. (But don't finalize them; see 7.5.6.3(3)). -1. Replace any assumed-shape argument data that were compacted on +#. Replace any assumed-shape argument data that were compacted on entry for contiguity when the data were possibly - modified across the call (never when `INTENT(IN)` or `VALUE`). -1. Identify alternate `RETURN` to caller. -1. Marshal results. -1. Jump - -### On return to the caller: -1. Save the result registers, if any. -1. Copy effective argument array designator data that was copied into + modified across the call (never when ``INTENT(IN)`` or ``VALUE``\ ). +#. Identify alternate ``RETURN`` to caller. +#. Marshal results. +#. Jump + +On return to the caller: +^^^^^^^^^^^^^^^^^^^^^^^^ + + +#. Save the result registers, if any. +#. Copy effective argument array designator data that was copied into a temporary back into its original storage (see below). -1. Complete deallocation of effective argument temporaries (not `VALUE`). -1. Reload definable host-escaping local objects from memory, if they +#. Complete deallocation of effective argument temporaries (not ``VALUE``\ ). +#. Reload definable host-escaping local objects from memory, if they were saved to memory by the host before the call. -1. `GO TO` alternate return, if any. -1. Use the function result in an expression. -1. Eventually, finalize &/or deallocate the function result. +#. ``GO TO`` alternate return, if any. +#. Use the function result in an expression. +#. Eventually, finalize &/or deallocate the function result. (I've omitted some obvious steps, like preserving/restoring callee-saved registers on entry/exit, dealing with caller-saved registers before/after calls, and architecture-dependent ABI requirements.) -## The messy details +The messy details +----------------- -### Copying effective argument values into temporary storage +Copying effective argument values into temporary storage +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ There are several conditions that require the compiler to generate code that allocates and populates temporary storage for an actual @@ -254,38 +266,39 @@ First, effective arguments that are expressions, not designators, obviously need to be computed and captured into memory in order to be passed by reference. -This includes parenthesized designators like `(X)`, which are +This includes parenthesized designators like ``(X)``\ , which are expressions in Fortran, as an important special case. (This case also technically includes unparenthesized constants, but those are better implemented by passing addresses in read-only memory.) -The dummy argument cannot be known to have `INTENT(OUT)` or -`INTENT(IN OUT)`. +The dummy argument cannot be known to have ``INTENT(OUT)`` or +``INTENT(IN OUT)``. -Small scalar or elemental `VALUE` arguments may be passed in registers, -as should arguments wrapped in the legacy VMS `%VAL()` notation. -Multiple elemental `VALUE` arguments might be packed into SIMD registers. +Small scalar or elemental ``VALUE`` arguments may be passed in registers, +as should arguments wrapped in the legacy VMS ``%VAL()`` notation. +Multiple elemental ``VALUE`` arguments might be packed into SIMD registers. Effective arguments that are designators, not expressions, must also be copied into temporaries in the following situations. -1. Coindexed objects need to be copied into the local image. - This can get very involved if they contain `ALLOCATABLE` + +#. Coindexed objects need to be copied into the local image. + This can get very involved if they contain ``ALLOCATABLE`` components, which also need to be copied, along with their - `ALLOCATABLE` components, and may be best implemented with a runtime + ``ALLOCATABLE`` components, and may be best implemented with a runtime library routine working off a description of the type. -1. Effective arguments associated with dummies with the `VALUE` +#. Effective arguments associated with dummies with the ``VALUE`` attribute need to be copied; this can be done on either side of the call, but there are optimization opportunities available when the caller's side bears the responsibility. -1. In non-elemental calls, the values of array sections with +#. In non-elemental calls, the values of array sections with vector-valued subscripts need to be gathered into temporaries. These effective arguments are not definable, and they are not allowed to - be associated with non-`VALUE` dummy arguments with the attributes - `INTENT(OUT)`, `INTENT(IN OUT)`, `ASYNCHRONOUS`, or `VOLATILE` - (15.5.2.4(21)); `INTENT()` can't always be checked. -1. Non-simply-contiguous (9.5.4) arrays being passed to non-`POINTER` - dummy arguments that must be contiguous (due to a `CONTIGUOUS` + be associated with non-\ ``VALUE`` dummy arguments with the attributes + ``INTENT(OUT)``\ , ``INTENT(IN OUT)``\ , ``ASYNCHRONOUS``\ , or ``VOLATILE`` + (15.5.2.4(21)); ``INTENT()`` can't always be checked. +#. Non-simply-contiguous (9.5.4) arrays being passed to non-\ ``POINTER`` + dummy arguments that must be contiguous (due to a ``CONTIGUOUS`` attribute, or not being assumed-shape or assumed-rank; this is always the case for F77ish procedures). This should be a runtime decision, so that effective arguments @@ -293,7 +306,7 @@ This rule does not apply to coarray dummies, whose effective arguments are required to be simply contiguous when this rule would otherwise force the use of a temporary (15.5.2.8); neither does it apply - to `ASYNCHRONOUS` and `VOLATILE` effective arguments, which are + to ``ASYNCHRONOUS`` and ``VOLATILE`` effective arguments, which are disallowed when copies would be necessary (C1538 - C1540). *Only temporaries created by this contiguity requirement are candidates for being copied back to the original variable after @@ -301,7 +314,7 @@ Fortran requires (18.3.6(5)) that calls to interoperable procedures with dummy argument arrays with contiguity requirements -handle the compaction of discontiguous data *in the Fortran callee*, +handle the compaction of discontiguous data *in the Fortran callee*\ , at least when called from C. And discontiguous data must be compacted on the *caller's* side when passed from Fortran to C (18.3.6(6)). @@ -314,32 +327,32 @@ discontiguity in the callee can be avoided by using a caller-compaction convention when we have the freedom to choose. -While we are unlikely to want to _needlessly_ use a temporary for +While we are unlikely to want to *needlessly* use a temporary for an effective argument that does not require one for any of these reasons above, we are specifically disallowed from doing so by the standard in cases where pointers to the original target data are required to be valid across the call (15.5.2.4(9-10)). In particular, compaction of assumed-shape arrays for discretionary contiguity on the leading dimension to ease SIMD vectorization -cannot be done safely for `TARGET` dummies without `VALUE`. +cannot be done safely for ``TARGET`` dummies without ``VALUE``. -Effective arguments associated with known `INTENT(OUT)` dummies that +Effective arguments associated with known ``INTENT(OUT)`` dummies that require allocation of a temporary -- and this can only be for reasons of contiguity -- don't have to populate it, but they do have to perform -minimal initialization of any `ALLOCATABLE` components so that +minimal initialization of any ``ALLOCATABLE`` components so that the runtime doesn't crash when the callee finalizes and deallocates them. -`ALLOCATABLE` coarrays are prohibited from being affected by `INTENT(OUT)` +``ALLOCATABLE`` coarrays are prohibited from being affected by ``INTENT(OUT)`` (see C846). Note that calls to implicit interfaces must conservatively allow -for the use of `INTENT(OUT)` by the callee. +for the use of ``INTENT(OUT)`` by the callee. -Except for `VALUE` and known `INTENT(IN)` dummy arguments, the original +Except for ``VALUE`` and known ``INTENT(IN)`` dummy arguments, the original contents of local designators that have been compacted into temporaries -could optionally have their `ALLOCATABLE` components invalidated +could optionally have their ``ALLOCATABLE`` components invalidated across the call as an aid to debugging. -Except for `VALUE` and known `INTENT(IN)` dummy arguments, the contents of +Except for ``VALUE`` and known ``INTENT(IN)`` dummy arguments, the contents of the temporary storage will be copied back into the effective argument designator after control returns from the procedure, and it may be necessary to preserve addresses (or the values of subscripts and cosubscripts @@ -347,50 +360,54 @@ elements, in additional temporary storage if they can't be safely or quickly recomputed after the call. -### `INTENT(OUT)` preparation +``INTENT(OUT)`` preparation +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Effective arguments that are associated with `INTENT(OUT)` +Effective arguments that are associated with ``INTENT(OUT)`` dummy arguments are required to be definable. -This cannot always be checked, as the use of `INTENT(OUT)` +This cannot always be checked, as the use of ``INTENT(OUT)`` does not by itself mandate the use of an explicit interface. -`INTENT(OUT)` arguments are finalized (as if) on entry to the called +``INTENT(OUT)`` arguments are finalized (as if) on entry to the called procedure. In particular, in calls to elemental procedures, the elements of an array are finalized by a scalar or elemental -`FINAL` procedure (7.5.6.3(7)). +``FINAL`` procedure (7.5.6.3(7)). -Derived type components that are `ALLOCATABLE` are finalized +Derived type components that are ``ALLOCATABLE`` are finalized and deallocated; they are prohibited from being coarrays. Components with initializers are (re)initialized. -The preparation of effective arguments for `INTENT(OUT)` could be +The preparation of effective arguments for ``INTENT(OUT)`` could be done on either side of the call. If the preparation is done by the caller, there is an optimization opportunity -in situations where unmodified incoming `INTENT(OUT)` dummy -arguments whose types lack `FINAL` procedures are being passed -onward as outgoing `INTENT(OUT)` arguments. +in situations where unmodified incoming ``INTENT(OUT)`` dummy +arguments whose types lack ``FINAL`` procedures are being passed +onward as outgoing ``INTENT(OUT)`` arguments. -### Arguments and function results requiring descriptors +Arguments and function results requiring descriptors +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Dummy arguments are represented with the addresses of new descriptors when they have any of the following characteristics: -1. assumed-shape array (`DIMENSION::A(:)`) -1. assumed-rank array (`DIMENSION::A(..)`) -1. parameterized derived type with assumed `LEN` parameters -1. polymorphic (`CLASS(T)`, `CLASS(*)`) -1. assumed-type (`TYPE(*)`) -1. coarray dummy argument -1. `INTENT(IN) POINTER` argument (15.5.2.7, C.10.4) -`ALLOCATABLE` and other `POINTER` arguments can be passed by simple +#. assumed-shape array (\ ``DIMENSION::A(:)``\ ) +#. assumed-rank array (\ ``DIMENSION::A(..)``\ ) +#. parameterized derived type with assumed ``LEN`` parameters +#. polymorphic (\ ``CLASS(T)``\ , ``CLASS(*)``\ ) +#. assumed-type (\ ``TYPE(*)``\ ) +#. coarray dummy argument +#. ``INTENT(IN) POINTER`` argument (15.5.2.7, C.10.4) + +``ALLOCATABLE`` and other ``POINTER`` arguments can be passed by simple address. Non-F77ish procedures use descriptors to represent two further kinds of dummy arguments: -1. assumed-length `CHARACTER(*)` -1. dummy procedures + +#. assumed-length ``CHARACTER(*)`` +#. dummy procedures F77ish procedures use other means to convey character length and host instance links (respectively) for these arguments. @@ -399,31 +416,32 @@ a caller-supplied descriptor when they have any of the following characteristics, some which necessitate an explicit interface: -1. deferred-shape array (so `ALLOCATABLE` or `POINTER`) -1. derived type with any non-constant `LEN` parameter + +#. deferred-shape array (so ``ALLOCATABLE`` or ``POINTER``\ ) +#. derived type with any non-constant ``LEN`` parameter (C795 prohibit assumed lengths) -1. procedure pointer result (when the interface must be explicit) +#. procedure pointer result (when the interface must be explicit) Storage for a function call's result is allocated by the caller when -possible: the result is neither `ALLOCATABLE` nor `POINTER`, -the shape is scalar or explicit, and the type has `LEN` parameters +possible: the result is neither ``ALLOCATABLE`` nor ``POINTER``\ , +the shape is scalar or explicit, and the type has ``LEN`` parameters that are constant expressions. In other words, the result doesn't require the use of a descriptor but can't be returned in registers. This allows a function result to be written directly into a local variable or temporary when it is safe to treat the variable as if -it were an additional `INTENT(OUT)` argument. -(Storage for `CHARACTER` results, assumed or explicit, is always +it were an additional ``INTENT(OUT)`` argument. +(Storage for ``CHARACTER`` results, assumed or explicit, is always allocated by the caller, and the length is always passed so that an assumed-length external function will work when eventually called from a scope that declares the length that it will use (15.5.2.9 (2)).) -Note that the lower bounds of the dimensions of non-`POINTER` -non-`ALLOCATABLE` dummy argument arrays are determined by the +Note that the lower bounds of the dimensions of non-\ ``POINTER`` +non-\ ``ALLOCATABLE`` dummy argument arrays are determined by the callee, not the caller. -(A Fortran pitfall: declaring `A(0:9)`, passing it to a dummy -array `D(:)`, and assuming that `LBOUND(D,1)` will be zero +(A Fortran pitfall: declaring ``A(0:9)``\ , passing it to a dummy +array ``D(:)``\ , and assuming that ``LBOUND(D,1)`` will be zero in the callee.) If the declaration of an assumed-shape dummy argument array contains an explicit lower bound expression (R819), its value @@ -432,14 +450,15 @@ as long as we assume that argument descriptors can be modified by callees. Callers should fill in all of the fields of outgoing -non-`POINTER` non-`ALLOCATABLE` argument +non-\ ``POINTER`` non-\ ``ALLOCATABLE`` argument descriptors with the assumption that the callee will use 1 for lower bound values, and callees can rely on them being 1 if not modified. -### Copying temporary storage back into argument designators +Copying temporary storage back into argument designators +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -Except for `VALUE` and known `INTENT(IN)` dummy arguments and array sections +Except for ``VALUE`` and known ``INTENT(IN)`` dummy arguments and array sections with vector-valued subscripts (15.5.2.4(21)), temporary storage into which effective argument data were compacted for contiguity before the call must be redistributed back to its original storage by the caller after @@ -450,16 +469,17 @@ temporary data should be redistributed; the descriptor need not be fully populated with type information. -Note that coindexed objects with `ALLOCATABLE` ultimate components +Note that coindexed objects with ``ALLOCATABLE`` ultimate components are required to be associated only with dummy arguments with the -`VALUE` &/or `INTENT(IN)` attributes (15.6.2.4(6)), so there is no +``VALUE`` &/or ``INTENT(IN)`` attributes (15.6.2.4(6)), so there is no requirement that the local image somehow reallocate remote storage when copying the data back. -### Polymorphic bindings +Polymorphic bindings +^^^^^^^^^^^^^^^^^^^^ Calls to the type-bound procedures of monomorphic types are -resolved at compilation time, as are calls to `NON_OVERRIDABLE` +resolved at compilation time, as are calls to ``NON_OVERRIDABLE`` type-bound procedures. The resolution of calls to overridable type-bound procedures of polymorphic types must be completed at execution (generic resolution @@ -474,7 +494,8 @@ copying, finalization, and I/O of type instances). Each overridable type-bound procedure in the type corresponds to an index into this table. -### Host instance linkage +Host instance linkage +^^^^^^^^^^^^^^^^^^^^^ Calls to dummy procedures and procedure pointers that resolve to internal procedures need to pass an additional "host instance" argument that @@ -485,8 +506,8 @@ languages with nested subprograms, although Fortran only allows one level of nesting. The 64-bit x86 and little-endian OpenPower ABIs reserve registers -for this purpose (`%r10` & `R11`); 64-bit ARM has a reserved register -that can be used (`x18`). +for this purpose (\ ``%r10`` & ``R11``\ ); 64-bit ARM has a reserved register +that can be used (\ ``x18``\ ). The host subprogram objects that are visible to any of their internal subprograms need to be resident in memory across any calls to them @@ -531,14 +552,15 @@ to avoid the overhead of constructing and reclaiming a trampoline. Procedure descriptors can also support multiple code addresses. -### Naming +Naming +^^^^^^ -External subroutines and functions (R503) and `ENTRY` points (R1541) -with `BIND(C)` (R808) have linker-visible names that are either explicitly +External subroutines and functions (R503) and ``ENTRY`` points (R1541) +with ``BIND(C)`` (R808) have linker-visible names that are either explicitly specified in the program or determined by straightforward rules. The names of other F77ish external procedures should respect the conventions of the target architecture for legacy Fortran '77 programs; this is typically -something like `foo_`. +something like ``foo_``. In other cases, however, we have fewer constraints on external naming, as well as some additional requirements and goals. @@ -549,10 +571,10 @@ Note that submodule names are distinct in their modules, not hierarchical, so at most two levels of qualification are needed. -Pure `ELEMENTAL` functions (15.8) must use distinct names for any alternate +Pure ``ELEMENTAL`` functions (15.8) must use distinct names for any alternate entry points used for packed SIMD arguments of various widths if we support calls to these functions in SIMD parallel contexts. -There are already conventions for these names in `libpgmath`. +There are already conventions for these names in ``libpgmath``. The names of non-F77ish external procedures should be distinguished as such so that incorrect attempts to call or pass @@ -576,104 +598,135 @@ (so long as there's a way to cope with extension names that don't begin with letters). -In particular, the period (`.`) seems safe to use as a separator character, -so a `Fa.` prefix can serve to isolate these discretionary names from +In particular, the period (\ ``.``\ ) seems safe to use as a separator character, +so a ``Fa.`` prefix can serve to isolate these discretionary names from other uses and to identify the earliest link-compatible version. -For examples: `Fa.mod.foo`, `Fa.mod.submod.foo`, and (for an external -subprogram that requires an explicit interface) `Fa.foo`. +For examples: ``Fa.mod.foo``\ , ``Fa.mod.submod.foo``\ , and (for an external +subprogram that requires an explicit interface) ``Fa.foo``. When the ABI changes in the future in an incompatible way, the -initial prefix becomes `Fb.`, `Fc.`, &c. +initial prefix becomes ``Fb.``\ , ``Fc.``\ , &c. -## Summary of checks to be enforced in semantics analysis +Summary of checks to be enforced in semantics analysis +------------------------------------------------------ -8.5.10 `INTENT` attributes -* (C846) An `INTENT(OUT)` argument shall not be associated with an +8.5.10 ``INTENT`` attributes + + +* (C846) An ``INTENT(OUT)`` argument shall not be associated with an object that is or has an allocatable coarray. -* (C847) An `INTENT(OUT)` argument shall not have `LOCK_TYPE` or `EVENT_TYPE`. +* (C847) An ``INTENT(OUT)`` argument shall not have ``LOCK_TYPE`` or ``EVENT_TYPE``. + +8.5.18 ``VALUE`` attribute + -8.5.18 `VALUE` attribute * (C863) The argument cannot be assumed-size, a coarray, or have a coarray ultimate component. -* (C864) The argument cannot be `ALLOCATABLE`, `POINTER`, `INTENT(OUT)`, - `INTENT(IN OUT)`, or `VOLATILE`. -* (C865) If the procedure is `BIND(C)`, the argument cannot be `OPTIONAL`. +* (C864) The argument cannot be ``ALLOCATABLE``\ , ``POINTER``\ , ``INTENT(OUT)``\ , + ``INTENT(IN OUT)``\ , or ``VOLATILE``. +* (C865) If the procedure is ``BIND(C)``\ , the argument cannot be ``OPTIONAL``. 15.5.1 procedure references: -* (C1533) can't pass non-intrinsic `ELEMENTAL` as argument + + +* (C1533) can't pass non-intrinsic ``ELEMENTAL`` as argument * (C1536) alternate return labels must be in the inclusive scope -* (C1537) coindexed argument cannot have a `POINTER` ultimate component +* (C1537) coindexed argument cannot have a ``POINTER`` ultimate component + +15.5.2.4 requirements for non-\ ``POINTER`` non-\ ``ALLOCATABLE`` dummies: + -15.5.2.4 requirements for non-`POINTER` non-`ALLOCATABLE` dummies: * (2) dummy must be monomorphic for coindexed polymorphic actual * (2) dummy must be polymorphic for assumed-size polymorphic actual -* (2) dummy cannot be `TYPE(*)` if effective is PDT or has TBPs or `FINAL` +* (2) dummy cannot be ``TYPE(*)`` if effective is PDT or has TBPs or ``FINAL`` * (4) character length of effective cannot be less than dummy -* (6) coindexed effective with `ALLOCATABLE` ultimate component requires - `INTENT(IN)` &/or `VALUE` dummy +* (6) coindexed effective with ``ALLOCATABLE`` ultimate component requires + .. code-block:: + + `INTENT(IN)` &/or `VALUE` dummy + * (13) a coindexed scalar effective requires a scalar dummy * (14) a non-conindexed scalar effective usually requires a scalar dummy, but there are some exceptions that allow elements of storage sequences to be passed and treated like explicit-shape or assumed-size arrays (see 15.5.2.11) * (16) array rank agreement -* (20) `INTENT(OUT)` & `INTENT(IN OUT)` dummies require definable actuals +* (20) ``INTENT(OUT)`` & ``INTENT(IN OUT)`` dummies require definable actuals * (21) array sections with vector subscripts can't be passed to definable dummies - (`INTENT(OUT)`, `INTENT(IN OUT)`, `ASYNCHRONOUS`, `VOLATILE`) -* (22) `VOLATILE` attributes must match when dummy has a coarray ultimate component -* (C1538 - C1540) checks for `ASYNCHRONOUS` and `VOLATILE` + .. code-block:: + + (`INTENT(OUT)`, `INTENT(IN OUT)`, `ASYNCHRONOUS`, `VOLATILE`) -15.5.2.5 requirements for `ALLOCATABLE` & `POINTER` arguments when both +* (22) ``VOLATILE`` attributes must match when dummy has a coarray ultimate component +* (C1538 - C1540) checks for ``ASYNCHRONOUS`` and ``VOLATILE`` + +15.5.2.5 requirements for ``ALLOCATABLE`` & ``POINTER`` arguments when both the dummy and effective arguments have the same attributes: + + * (2) both or neither can be polymorphic * (2) both are unlimited polymorphic or both have the same declared type * (3) rank compatibility * (4) effective argument must have deferred the same type parameters as the dummy -15.5.2.6 `ALLOCATABLE` dummy arguments: -* (2) effective must be `ALLOCATABLE` +15.5.2.6 ``ALLOCATABLE`` dummy arguments: + + +* (2) effective must be ``ALLOCATABLE`` * (3) corank must match -* (4) coindexed effective requires `INTENT(IN)` dummy -* (7) `INTENT(OUT)` & `INTENT(IN OUT)` dummies require definable actuals +* (4) coindexed effective requires ``INTENT(IN)`` dummy +* (7) ``INTENT(OUT)`` & ``INTENT(IN OUT)`` dummies require definable actuals -15.5.2.7 `POINTER` dummy arguments: -* (C1541) `CONTIGUOUS` dummy requires simply contiguous actual +15.5.2.7 ``POINTER`` dummy arguments: + + +* (C1541) ``CONTIGUOUS`` dummy requires simply contiguous actual * (C1542) effective argument cannot be coindexed unless procedure is intrinsic -* (2) effective argument must be `POINTER` unless dummy is `INTENT(IN)` and +* (2) effective argument must be ``POINTER`` unless dummy is ``INTENT(IN)`` and effective could be the right-hand side of a pointer assignment statement 15.5.2.8 corray dummy arguments: + + * (1) effective argument must be coarray -* (1) `VOLATILE` attributes must match +* (1) ``VOLATILE`` attributes must match * (2) explicitly or implicitly contiguous dummy array requires a simply contiguous actual 15.5.2.9 dummy procedures: + + * (1) explicit dummy procedure interface must have same characteristics as actual -* (5) dummy procedure `POINTER` requirements on effective arguments +* (5) dummy procedure ``POINTER`` requirements on effective arguments 15.6.2.1 procedure definitions: -* `NON_RECURSIVE` procedures cannot recurse. -* Assumed-length `CHARACTER(*)` functions cannot be declared as `RECURSIVE`, array-valued, - `POINTER`, `ELEMENTAL`, or `PURE' (C723), and cannot be called recursively (15.6.2.1(3)). + + +* ``NON_RECURSIVE`` procedures cannot recurse. +* Assumed-length ``CHARACTER(*)`` functions cannot be declared as ``RECURSIVE``\ , array-valued, + ``POINTER``\ , ``ELEMENTAL``\ , or `PURE' (C723), and cannot be called recursively (15.6.2.1(3)). * (C823) A function result cannot be a coarray or contain a coarray ultimate component. -`PURE` requirements (15.7): C1583 - C1599. -These also apply to `ELEMENTAL` procedures that are not `IMPURE`. +``PURE`` requirements (15.7): C1583 - C1599. +These also apply to ``ELEMENTAL`` procedures that are not ``IMPURE``. -`ELEMENTAL` requirements (15.8.1): C15100-C15103, +``ELEMENTAL`` requirements (15.8.1): C15100-C15103, and C1533 (can't pass as effective argument unless intrinsic) For interoperable procedures and interfaces (18.3.6): + + * C1552 - C1559 * function result is scalar and of interoperable type (C1553, 18.3.1-3) -* `VALUE` arguments are scalar and of interoperable type -* `POINTER` dummies cannot be `CONTIGUOUS` (18.3.6 paragraph 2(5)) -* assumed-type dummies cannot be `ALLOCATABLE`, `POINTER`, assumed-shape, or assumed-rank (18.3.6 paragraph 2 (5)) -* `CHARACTER` dummies that are `ALLOCATABLE` or `POINTER` must be deferred-length +* ``VALUE`` arguments are scalar and of interoperable type +* ``POINTER`` dummies cannot be ``CONTIGUOUS`` (18.3.6 paragraph 2(5)) +* assumed-type dummies cannot be ``ALLOCATABLE``\ , ``POINTER``\ , assumed-shape, or assumed-rank (18.3.6 paragraph 2 (5)) +* ``CHARACTER`` dummies that are ``ALLOCATABLE`` or ``POINTER`` must be deferred-length + +Further topics to document +-------------------------- -## Further topics to document * Alternate return specifiers -* `%VAL()`, `%REF()`, and `%DESCR()` legacy VMS interoperability extensions +* ``%VAL()``\ , ``%REF()``\ , and ``%DESCR()`` legacy VMS interoperability extensions * Unrestricted specific intrinsic functions as effective arguments -* SIMD variants of `ELEMENTAL` procedures (& unrestricted specific intrinsics) +* SIMD variants of ``ELEMENTAL`` procedures (& unrestricted specific intrinsics) * Elemental subroutine calls with array arguments diff --git a/flang/documentation/Character.md b/flang/docs/Character.rst rename from flang/documentation/Character.md rename to flang/docs/Character.rst --- a/flang/documentation/Character.md +++ b/flang/docs/Character.rst @@ -1,147 +1,163 @@ - +.. raw:: html -## Implementation of `CHARACTER` types in f18 + + + + +Implementation of ``CHARACTER`` types in f18 +------------------------------------------------ + +Kinds and Character Sets +^^^^^^^^^^^^^^^^^^^^^^^^ The f18 compiler and runtime support three kinds of the intrinsic -`CHARACTER` type of Fortran 2018. -The default (`CHARACTER(KIND=1)`) holds 8-bit character codes; -`CHARACTER(KIND=2)` holds 16-bit character codes; -and `CHARACTER(KIND=4)` holds 32-bit character codes. +``CHARACTER`` type of Fortran 2018. +The default (\ ``CHARACTER(KIND=1)``\ ) holds 8-bit character codes; +``CHARACTER(KIND=2)`` holds 16-bit character codes; +and ``CHARACTER(KIND=4)`` holds 32-bit character codes. We assume that code values 0 through 127 correspond to -the 7-bit ASCII character set (ISO-646) in every kind of `CHARACTER`. +the 7-bit ASCII character set (ISO-646) in every kind of ``CHARACTER``. This is a valid assumption for Unicode (UCS == ISO/IEC-10646), ISO-8859, and many legacy character sets and interchange formats. -`CHARACTER` data in memory and unformatted files are not in an +``CHARACTER`` data in memory and unformatted files are not in an interchange representation (like UTF-8, Shift-JIS, EUC-JP, or a JIS X). Each character's code in memory occupies a 1-, 2-, or 4- byte word and substrings can be indexed with simple arithmetic. -In formatted I/O, however, `CHARACTER` data may be assumed to use +In formatted I/O, however, ``CHARACTER`` data may be assumed to use the UTF-8 variable-length encoding when it is selected with -`OPEN(ENCODING='UTF-8')`. +``OPEN(ENCODING='UTF-8')``. -`CHARACTER(KIND=1)` literal constants in Fortran source files, -Hollerith constants, and formatted I/O with `ENCODING='DEFAULT'` +``CHARACTER(KIND=1)`` literal constants in Fortran source files, +Hollerith constants, and formatted I/O with ``ENCODING='DEFAULT'`` are not translated. -For the purposes of non-default-kind `CHARACTER` constants in Fortran -source files, formatted I/O with `ENCODING='UTF-8'` or non-default-kind -`CHARACTER` value, and conversions between kinds of `CHARACTER`, +For the purposes of non-default-kind ``CHARACTER`` constants in Fortran +source files, formatted I/O with ``ENCODING='UTF-8'`` or non-default-kind +``CHARACTER`` value, and conversions between kinds of ``CHARACTER``\ , by default: -* `CHARACTER(KIND=1)` is assumed to be ISO-8859-1 (Latin-1), -* `CHARACTER(KIND=2)` is assumed to be UCS-2 (16-bit Unicode), and -* `CHARACTER(KIND=4)` is assumed to be UCS-4 (full Unicode in a 32-bit word). + + +* ``CHARACTER(KIND=1)`` is assumed to be ISO-8859-1 (Latin-1), +* ``CHARACTER(KIND=2)`` is assumed to be UCS-2 (16-bit Unicode), and +* ``CHARACTER(KIND=4)`` is assumed to be UCS-4 (full Unicode in a 32-bit word). In particular, conversions between kinds are assumed to be simple zero-extensions or truncation, not table look-ups. We might want to support one or more environment variables to change these -assumptions, especially for `KIND=1` users of ISO-8859 character sets +assumptions, especially for ``KIND=1`` users of ISO-8859 character sets besides Latin-1. -### Lengths +Lengths +^^^^^^^ -Allocatable `CHARACTER` objects in Fortran may defer the specification +Allocatable ``CHARACTER`` objects in Fortran may defer the specification of their lengths until the time of their allocation or whole (non-substring) assignment. Non-allocatable objects (and non-deferred-length allocatables) have lengths that are fixed or assumed from an actual argument, or, -in the case of assumed-length `CHARACTER` functions, their local +in the case of assumed-length ``CHARACTER`` functions, their local declaration in the calling scope. -The elements of `CHARACTER` arrays have the same length. +The elements of ``CHARACTER`` arrays have the same length. Assignments to targets that are not deferred-length allocatables will truncate or pad the assigned value to the length of the left-hand side of the assignment. Lengths and offsets that are used by or exposed to Fortran programs via -declarations, substring bounds, and the `LEN()` intrinsic function are always +declarations, substring bounds, and the ``LEN()`` intrinsic function are always represented in units of characters, not bytes. In generated code, assumed-length arguments, the runtime support library, -and in the `elem_len` field of the interoperable descriptor `cdesc_t`, +and in the ``elem_len`` field of the interoperable descriptor ``cdesc_t``\ , lengths are always in units of bytes. The distinction matters only for kinds other than the default. Fortran substrings are rather like subscript triplets into a hidden -"zero" dimension of a scalar `CHARACTER` value, but they cannot have +"zero" dimension of a scalar ``CHARACTER`` value, but they cannot have strides. -### Concatenation +Concatenation +^^^^^^^^^^^^^ -Fortran has one `CHARACTER`-valued intrinsic operator, `//`, which +Fortran has one ``CHARACTER``\ -valued intrinsic operator, ``//``\ , which concatenates its operands (10.1.5.3). The operands must have the same kind type parameter. One or both of the operands may be arrays; if both are arrays, their shapes must be identical. The effective length of the result is the sum of the lengths of the operands. -Parentheses may be ignored, so any `CHARACTER`-valued expression +Parentheses may be ignored, so any ``CHARACTER``\ -valued expression may be "flattened" into a single sequence of concatenations. -The result of `//` may be used +The result of ``//`` may be used + + * as an operand to another concatenation, -* as an operand of a `CHARACTER` relation, +* as an operand of a ``CHARACTER`` relation, * as an actual argument, * as the right-hand side of an assignment, -* as the `SOURCE=` or `MOLD=` of an `ALLOCATE` statemnt, -* as the selector or case-expr of an `ASSOCIATE` or `SELECT` construct, +* as the ``SOURCE=`` or ``MOLD=`` of an ``ALLOCATE`` statemnt, +* as the selector or case-expr of an ``ASSOCIATE`` or ``SELECT`` construct, * as a component of a structure or array constructor, * as the value of a named constant or initializer, -* as the `NAME=` of a `BIND(C)` attribute, -* as the stop-code of a `STOP` statement, +* as the ``NAME=`` of a ``BIND(C)`` attribute, +* as the stop-code of a ``STOP`` statement, * as the value of a specifier of an I/O statement, * or as the value of a statement function. The f18 compiler has a general (but slow) means of implementing concatenation and a specialized (fast) option to optimize the most common case. -#### General concatenation +General concatenation +~~~~~~~~~~~~~~~~~~~~~ In the most general case, the f18 compiler's generated code and runtime support library represent the result as a deferred-length allocatable -`CHARACTER` temporary scalar or array variable that is initialized -as a zero-length array by `AllocatableInitCharacter()` +``CHARACTER`` temporary scalar or array variable that is initialized +as a zero-length array by ``AllocatableInitCharacter()`` and then progressively augmented in place by the values of each of the operands of the concatenation sequence in turn with calls to -`CharacterConcatenate()`. +``CharacterConcatenate()``. Conformability errors are fatal -- Fortran has no means by which a program may recover from them. The result is then used as any other deferred-length allocatable array or scalar would be, and finally deallocated like any other allocatable. -The runtime routine `CharacterAssign()` takes care of +The runtime routine ``CharacterAssign()`` takes care of truncating, padding, or replicating the value(s) assigned to the left-hand side, as well as reallocating an nonconforming or deferred-length allocatable left-hand side. It takes the descriptors of the left- and right-hand sides of -a `CHARACTER` assignemnt as its arguments. +a ``CHARACTER`` assignemnt as its arguments. -When the left-hand side of a `CHARACTER` assignment is a deferred-length +When the left-hand side of a ``CHARACTER`` assignment is a deferred-length allocatable and the right-hand side is a temporary, use of the runtime's -`MoveAlloc()` subroutine instead can save an allocation and a copy. +``MoveAlloc()`` subroutine instead can save an allocation and a copy. -#### Optimized concatenation +Optimized concatenation +~~~~~~~~~~~~~~~~~~~~~~~ -Scalar `CHARACTER(KIND=1)` expressions evaluated as the right-hand sides of +Scalar ``CHARACTER(KIND=1)`` expressions evaluated as the right-hand sides of assignments to independent substrings or whole variables that are not deferred-length allocatables can be optimized into a sequence of calls to the runtime support library that do not allocate temporary memory. -The routine `CharacterAppend()` copies data from the right-hand side value +The routine ``CharacterAppend()`` copies data from the right-hand side value to the remaining space, if any, in the left-hand side object, and returns the new offset of the reduced remaining space. -It is essentially `memcpy(lhs + offset, rhs, min(lhsLength - offset, rhsLength))`. -It does nothing when `offset > lhsLength`. +It is essentially ``memcpy(lhs + offset, rhs, min(lhsLength - offset, rhsLength))``. +It does nothing when ``offset > lhsLength``. -`void CharacterPad()`adds any necessary trailing blank characters. +``void CharacterPad()``\ adds any necessary trailing blank characters. diff --git a/flang/documentation/ControlFlowGraph.md b/flang/docs/ControlFlowGraph.rst rename from flang/documentation/ControlFlowGraph.md rename to flang/docs/ControlFlowGraph.rst --- a/flang/documentation/ControlFlowGraph.md +++ b/flang/docs/ControlFlowGraph.rst @@ -1,16 +1,10 @@ - - -## Concept +Concept +------- + After a Fortran subprogram has been parsed, its names resolved, and all its semantic constraints successfully checked, the parse tree of its executable part is translated into another abstract representation, -namely the _control flow graph_ described in this note. +namely the *control flow graph* described in this note. This second representation of the subprogram's executable part is suitable for analysis and incremental modification as the subprogram @@ -18,11 +12,13 @@ Many high-level Fortran features are implemented by rewriting portions of a subprogram's control flow graph in place. -### Control Flow Graph -A _control flow graph_ is a collection of simple (_i.e.,_ "non-extended") -basic _blocks_ that comprise straight-line sequences of _actions_ with a +Control Flow Graph +^^^^^^^^^^^^^^^^^^ + +A *control flow graph* is a collection of simple (\ *i.e.,* "non-extended") +basic *blocks* that comprise straight-line sequences of *actions* with a single entry point and a single exit point, and a collection of -directed flow _edges_ (or _arcs_) denoting all possible transitions of +directed flow *edges* (or *arcs*\ ) denoting all possible transitions of control flow that may take place during execution from the end of one basic block to the beginning of another (or itself). @@ -32,10 +28,10 @@ The sequence of actions that constitutes a basic block may include references to user and library procedures. Subprogram calls with implicit control flow afterwards, namely -alternate returns and `END=`/`ERR=` labels on input/output, +alternate returns and ``END=``\ /\ ``ERR=`` labels on input/output, will be lowered in translation to a representation that materializes -that control flow into something similar to a computed `GO TO` or -C language `switch` statement. +that control flow into something similar to a computed ``GO TO`` or +C language ``switch`` statement. For convenience in optimization and to simplify the implementation of data flow confluence functions, we may choose to maintain the @@ -46,33 +42,35 @@ invariant property. Fortran subprograms (other than internal subprograms) can have multiple -entry points by using the obsolescent `ENTRY` statement. +entry points by using the obsolescent ``ENTRY`` statement. We will implement such subprograms by constructing a union of their dummy argument lists and using it as part of the definition of a new subroutine or function that can be called by each of the entry points, which are then all converted into wrapper routines that -pass a selector value as an additional argument to drive a `switch` on entry +pass a selector value as an additional argument to drive a ``switch`` on entry to the new subprogram. This transformation ensures that every subprogram's control -flow graph has a well-defined `START` node. +flow graph has a well-defined ``START`` node. Statement labels can be used in Fortran on any statement, but only -the labels that decorate legal destinations of `GO TO` statements +the labels that decorate legal destinations of ``GO TO`` statements need to be implemented in the control flow graph. -Specifically, non-executable statements like `DATA`, `NAMELIST`, and -`FORMAT` statements will be extracted into data initialization +Specifically, non-executable statements like ``DATA``\ , ``NAMELIST``\ , and +``FORMAT`` statements will be extracted into data initialization records before or during the construction of the control flow -graph, and will survive only as synonyms for `CONTINUE`. +graph, and will survive only as synonyms for ``CONTINUE``. -Nests of multiple labeled `DO` loops that terminate on the same -label will be have that label rewritten so that `GO TO` within +Nests of multiple labeled ``DO`` loops that terminate on the same +label will be have that label rewritten so that ``GO TO`` within the loop nest will arrive at the copy that most closely nests the context. The Fortran standard does not require us to do this, but XLF (at least) works this way. -### Expressions and Statements (Operations and Actions) +Expressions and Statements (Operations and Actions) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Expressions are trees, not DAGs, of intrinsic operations, resolved function references, constant literals, and data designators. @@ -82,7 +80,7 @@ intrinsic type, templatized over its supported kind type parameter values. Operands are storage-owning indirections to other instances -of `Expression`, instances of constant values, and to representations +of ``Expression``\ , instances of constant values, and to representations of data and function references. These indirections are not nullable apart from the situation in which the operands of an expression are being removed for use elsewhere before @@ -100,37 +98,41 @@ in a basic block of the control flow graph (e.g., the right hand side of an assignment statement). -Each basic block comprises a linear sequence of _actions_. +Each basic block comprises a linear sequence of *actions*. These are represented as a doubly-linked list so that insertion and deletion can be done in constant time. Only the last action in a basic block can represent a change to the flow of control. -### Scope Transitions +Scope Transitions +^^^^^^^^^^^^^^^^^ + Some of the various scopes of the symbol table are visible in the control flow -graph as `SCOPE ENTRY` and `SCOPE EXIT` actions. -`SCOPE ENTRY` actions are unique for their corresponding scopes, -while `SCOPE EXIT` actions need not be so. +graph as ``SCOPE ENTRY`` and ``SCOPE EXIT`` actions. +``SCOPE ENTRY`` actions are unique for their corresponding scopes, +while ``SCOPE EXIT`` actions need not be so. It must be the case that any flow of control within the subprogram will enter only scopes that are not yet active, and exit only the most recently entered scope that has not yet been deactivated; i.e., when modeled by a push-down stack that is -pushed by each traversal of a `SCOPE ENTRY` action, +pushed by each traversal of a ``SCOPE ENTRY`` action, the entries of the stack are always distinct, only the scope at -the top of the stack is ever popped by `SCOPE EXIT`, and the stack is empty +the top of the stack is ever popped by ``SCOPE EXIT``\ , and the stack is empty when the subprogram terminates. Further, any references to resolved symbols must be to symbols whose scopes are active. -The `DEALLOCATE` actions and calls to `FINAL` procedures implied by scoped +The ``DEALLOCATE`` actions and calls to ``FINAL`` procedures implied by scoped lifetimes will be explicit in the sequence of actions in the control flow graph. Parallel regions might be partially represented by scopes, or by explicit operations similar to the scope entry and exit operations. -### Data Flow Representation +Data Flow Representation +^^^^^^^^^^^^^^^^^^^^^^^^ + The subprogram text will be in static single assignment form by the time the subprogram arrives at the bridge to the LLVM IR builder. Merge points are actions at the heads of basic blocks whose operands @@ -138,24 +140,36 @@ basic blocks whose operands are expression trees (which may refer to merge points). -### Rewriting Transformations +Rewriting Transformations +^^^^^^^^^^^^^^^^^^^^^^^^^ -#### I/O -#### Dynamic allocation -#### Array constructors +I/O +~~~ + +Dynamic allocation +~~~~~~~~~~~~~~~~~~ + +Array constructors +~~~~~~~~~~~~~~~~~~ + +Derived type initialization, deallocation, and finalization +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ -#### Derived type initialization, deallocation, and finalization The machinery behind the complicated semantics of Fortran's derived types -and `ALLOCATABLE` objects will be implemented in large part by the run time +and ``ALLOCATABLE`` objects will be implemented in large part by the run time support library. -#### Actual argument temporaries -#### Array assignments, `WHERE`, and `FORALL` +Actual argument temporaries +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Array assignments, ``WHERE``\ , and ``FORALL`` +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Array operations have shape. -`WHERE` masks have shape. -Their effects on array operations are by means of explicit `MASK` operands that +``WHERE`` masks have shape. +Their effects on array operations are by means of explicit ``MASK`` operands that are part of array assignment operations. -#### Intrinsic function and subroutine calls +Intrinsic function and subroutine calls +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ diff --git a/flang/docs/Directives.rst b/flang/docs/Directives.rst new file mode 100644 --- /dev/null +++ b/flang/docs/Directives.rst @@ -0,0 +1,7 @@ +Compiler directives supported by F18 +==================================== + + +* ``!dir$ fixed`` and ``!dir$ free`` select Fortran source forms. Their effect + persists to the end of the current source file. +* ``!dir$ ignore_tkr (tkr) var-list`` omits checks on type, kind, and/or rank. diff --git a/flang/documentation/Extensions.md b/flang/docs/Extensions.rst rename from flang/documentation/Extensions.md rename to flang/docs/Extensions.rst --- a/flang/documentation/Extensions.md +++ b/flang/docs/Extensions.rst @@ -1,11 +1,3 @@ - - As a general principle, this compiler will accept by default and without complaint many legacy features, extensions to the standard language, and features that have been deleted from the standard, @@ -18,66 +10,70 @@ Intentional violations of the standard ====================================== -* Scalar `INTEGER` actual argument expressions (not variables!) - are converted to the kinds of scalar `INTEGER` dummy arguments + + +* Scalar ``INTEGER`` actual argument expressions (not variables!) + are converted to the kinds of scalar ``INTEGER`` dummy arguments when the interface is explicit and the kinds differ. This conversion allows the results of the intrinsics like - `SIZE` that (as mentioned below) may return non-default - `INTEGER` results by default to be passed. A warning is + ``SIZE`` that (as mentioned below) may return non-default + ``INTEGER`` results by default to be passed. A warning is emitted when truncation is possible. -* We are not strict on the contents of `BLOCK DATA` subprograms +* We are not strict on the contents of ``BLOCK DATA`` subprograms so long as they contain no executable code, no internal subprograms, - and allocate no storage outside a named `COMMON` block. (C1415) + and allocate no storage outside a named ``COMMON`` block. (C1415) Extensions, deletions, and legacy features supported by default =============================================================== + + * Tabs in source -* `<>` as synonym for `.NE.` and `/=` -* `$` and `@` as legal characters in names -* Initialization in type declaration statements using `/values/` -* Kind specification with `*`, e.g. `REAL*4` -* `DOUBLE COMPLEX` +* ``<>`` as synonym for ``.NE.`` and ``/=`` +* ``$`` and ``@`` as legal characters in names +* Initialization in type declaration statements using ``/values/`` +* Kind specification with ``*``\ , e.g. ``REAL*4`` +* ``DOUBLE COMPLEX`` * Signed complex literal constants -* DEC `STRUCTURE`, `RECORD`, `UNION`, and `MAP` -* Structure field access with `.field` -* `BYTE` as synonym for `INTEGER(KIND=1)` -* Quad precision REAL literals with `Q` -* `X` prefix/suffix as synonym for `Z` on hexadecimal literals -* `B`, `O`, `Z`, and `X` accepted as suffixes as well as prefixes +* DEC ``STRUCTURE``\ , ``RECORD``\ , ``UNION``\ , and ``MAP`` +* Structure field access with ``.field`` +* ``BYTE`` as synonym for ``INTEGER(KIND=1)`` +* Quad precision REAL literals with ``Q`` +* ``X`` prefix/suffix as synonym for ``Z`` on hexadecimal literals +* ``B``\ , ``O``\ , ``Z``\ , and ``X`` accepted as suffixes as well as prefixes * Triplets allowed in array constructors -* Old-style `PARAMETER pi=3.14` statement without parentheses -* `%LOC`, `%VAL`, and `%REF` +* Old-style ``PARAMETER pi=3.14`` statement without parentheses +* ``%LOC``\ , ``%VAL``\ , and ``%REF`` * Leading comma allowed before I/O item list -* Empty parentheses allowed in `PROGRAM P()` -* Missing parentheses allowed in `FUNCTION F` -* Cray based `POINTER(p,x)` and `LOC()` intrinsic (with `%LOC()` as +* Empty parentheses allowed in ``PROGRAM P()`` +* Missing parentheses allowed in ``FUNCTION F`` +* Cray based ``POINTER(p,x)`` and ``LOC()`` intrinsic (with ``%LOC()`` as an alias) -* Arithmetic `IF`. (Which branch should NaN take? Fall through?) -* `ASSIGN` statement, assigned `GO TO`, and assigned format -* `PAUSE` statement +* Arithmetic ``IF``. (Which branch should NaN take? Fall through?) +* ``ASSIGN`` statement, assigned ``GO TO``\ , and assigned format +* ``PAUSE`` statement * Hollerith literals and edit descriptors -* `NAMELIST` allowed in the execution part +* ``NAMELIST`` allowed in the execution part * Omitted colons on type declaration statements with attributes -* COMPLEX constructor expression, e.g. `(x+y,z)` -* `+` and `-` before all primary expressions, e.g. `x*-y` -* `.NOT. .NOT.` accepted -* `NAME=` as synonym for `FILE=` +* COMPLEX constructor expression, e.g. ``(x+y,z)`` +* ``+`` and ``-`` before all primary expressions, e.g. ``x*-y`` +* ``.NOT. .NOT.`` accepted +* ``NAME=`` as synonym for ``FILE=`` * Data edit descriptors without width or other details -* `D` lines in fixed form as comments or debug code -* `CONVERT=` on the OPEN and INQUIRE statements -* `DISPOSE=` on the OPEN and INQUIRE statements +* ``D`` lines in fixed form as comments or debug code +* ``CONVERT=`` on the OPEN and INQUIRE statements +* ``DISPOSE=`` on the OPEN and INQUIRE statements * Leading semicolons are ignored before any statement that could have a label -* The character `&` in column 1 in fixed form source is a variant form +* The character ``&`` in column 1 in fixed form source is a variant form of continuation line. * Character literals as elements of an array constructor without an explicit type specifier need not have the same length; the longest literal determines the length parameter of the implicit type, not the first. * Outside a character literal, a comment after a continuation marker (&) need not begin with a comment marker (!). -* Classic C-style /*comments*/ are skipped, so multi-language header +* Classic C-style /\ *comments*\ / are skipped, so multi-language header files are easier to write and use. -* $ and \ edit descriptors are supported in FORMAT to suppress newline +* $ and edit descriptors are supported in FORMAT to suppress newline output on user prompts. * REAL and DOUBLE PRECISION variable and bounds in DO loops * Integer literals without explicit kind specifiers that are out of range @@ -87,18 +83,18 @@ unambiguous: the right hand sides of assigments and initializations of INTEGER entities, and as actual arguments to a few intrinsic functions (ACHAR, BTEST, CHAR). But they cannot be used if the type would not - be known (e.g., `IAND(X'1',X'2')`). + be known (e.g., ``IAND(X'1',X'2')``\ ). * BOZ literals can also be used as REAL values in some contexts where the type is unambiguous, such as initializations of REAL parameters. * EQUIVALENCE of numeric and character sequences (a ubiquitous extension) * Values for whole anonymous parent components in structure constructors - (e.g., `EXTENDEDTYPE(PARENTTYPE(1,2,3))` rather than `EXTENDEDTYPE(1,2,3)` - or `EXTENDEDTYPE(PARENTTYPE=PARENTTYPE(1,2,3))`). + (e.g., ``EXTENDEDTYPE(PARENTTYPE(1,2,3))`` rather than ``EXTENDEDTYPE(1,2,3)`` + or ``EXTENDEDTYPE(PARENTTYPE=PARENTTYPE(1,2,3))``\ ). * Some intrinsic functions are specified in the standard as requiring the same type and kind for their arguments (viz., ATAN with two arguments, ATAN2, DIM, HYPOT, MAX, MIN, MOD, and MODULO); we allow distinct types to be used, promoting - the arguments as if they were operands to an intrinsic `+` operator, + the arguments as if they were operands to an intrinsic ``+`` operator, and defining the result type accordingly. * DOUBLE COMPLEX intrinsics DREAL, DCMPLX, DCONJG, and DIMAG. * INT_PTR_KIND intrinsic returns the kind of c_intptr_t. @@ -111,56 +107,60 @@ * When a scalar CHARACTER actual argument of the same kind is known to have a length shorter than the associated dummy argument, it is extended on the right with blanks, similar to assignment. -* When a dummy argument is `POINTER` or `ALLOCATABLE` and is `INTENT(IN)`, we +* When a dummy argument is ``POINTER`` or ``ALLOCATABLE`` and is ``INTENT(IN)``\ , we relax enforcement of some requirements on actual arguments that must otherwise hold true for definable arguments. -* Assignment of `LOGICAL` to `INTEGER` and vice versa (but not other types) is +* Assignment of ``LOGICAL`` to ``INTEGER`` and vice versa (but not other types) is allowed. The values are normalized. * An effectively empty source file (no program unit) is accepted and produces an empty relocatable output file. -* A `RETURN` statement may appear in a main program. +* A ``RETURN`` statement may appear in a main program. * DATA statement initialization is allowed for procedure pointers outside structure constructors. Extensions supported when enabled by options -------------------------------------------- + + * C-style backslash escape sequences in quoted CHARACTER literals (but not Hollerith) [-fbackslash] -* Logical abbreviations `.T.`, `.F.`, `.N.`, `.A.`, `.O.`, and `.X.` +* Logical abbreviations ``.T.``\ , ``.F.``\ , ``.N.``\ , ``.A.``\ , ``.O.``\ , and ``.X.`` [-flogical-abbreviations] -* `.XOR.` as a synonym for `.NEQV.` [-fxor-operator] -* The default `INTEGER` type is required by the standard to occupy - the same amount of storage as the default `REAL` type. Default - `REAL` is of course 32-bit IEEE-754 floating-point today. This legacy +* ``.XOR.`` as a synonym for ``.NEQV.`` [-fxor-operator] +* The default ``INTEGER`` type is required by the standard to occupy + the same amount of storage as the default ``REAL`` type. Default + ``REAL`` is of course 32-bit IEEE-754 floating-point today. This legacy rule imposes an artificially small constraint in some cases - where Fortran mandates that something have the default `INTEGER` + where Fortran mandates that something have the default ``INTEGER`` type: specifically, the results of references to the intrinsic functions - `SIZE`, `LBOUND`, `UBOUND`, `SHAPE`, and the location reductions - `FINDLOC`, `MAXLOC`, and `MINLOC` in the absence of an explicit - `KIND=` actual argument. We return `INTEGER(KIND=8)` by default in - these cases when the `-flarge-sizes` option is enabled. -* Treat each specification-part like is has `IMPLICIT NONE` + ``SIZE``\ , ``LBOUND``\ , ``UBOUND``\ , ``SHAPE``\ , and the location reductions + ``FINDLOC``\ , ``MAXLOC``\ , and ``MINLOC`` in the absence of an explicit + ``KIND=`` actual argument. We return ``INTEGER(KIND=8)`` by default in + these cases when the ``-flarge-sizes`` option is enabled. +* Treat each specification-part like is has ``IMPLICIT NONE`` [-fimplicit-none-type-always] -* Ignore occurrences of `IMPLICIT NONE` and `IMPLICIT NONE(TYPE)` +* Ignore occurrences of ``IMPLICIT NONE`` and ``IMPLICIT NONE(TYPE)`` [-fimplicit-none-type-never] Extensions and legacy features deliberately not supported --------------------------------------------------------- -* `.LG.` as synonym for `.NE.` -* `REDIMENSION` -* Allocatable `COMMON` + + +* ``.LG.`` as synonym for ``.NE.`` +* ``REDIMENSION`` +* Allocatable ``COMMON`` * Expressions in formats -* `ACCEPT` as synonym for `READ *` -* `TYPE` as synonym for `PRINT` -* `ARRAY` as synonym for `DIMENSION` -* `VIRTUAL` as synonym for `DIMENSION` -* `ENCODE` and `DECODE` as synonyms for internal I/O -* `IMPLICIT AUTOMATIC`, `IMPLICIT STATIC` -* Default exponent of zero, e.g. `3.14159E` +* ``ACCEPT`` as synonym for ``READ *`` +* ``TYPE`` as synonym for ``PRINT`` +* ``ARRAY`` as synonym for ``DIMENSION`` +* ``VIRTUAL`` as synonym for ``DIMENSION`` +* ``ENCODE`` and ``DECODE`` as synonyms for internal I/O +* ``IMPLICIT AUTOMATIC``\ , ``IMPLICIT STATIC`` +* Default exponent of zero, e.g. ``3.14159E`` * Characters in defined operators that are neither letters nor digits -* `B` suffix on unquoted octal constants -* `Z` prefix on unquoted hexadecimal constants (dangerous) -* `T` and `F` as abbreviations for `.TRUE.` and `.FALSE.` in DATA (PGI/XLF) +* ``B`` suffix on unquoted octal constants +* ``Z`` prefix on unquoted hexadecimal constants (dangerous) +* ``T`` and ``F`` as abbreviations for ``.TRUE.`` and ``.FALSE.`` in DATA (PGI/XLF) * Use of host FORMAT labels in internal subprograms (PGI-only feature) * ALLOCATE(TYPE(derived)::...) as variant of correct ALLOCATE(derived::...) (PGI only) * Defining an explicit interface for a subprogram within itself (PGI only) @@ -171,7 +171,7 @@ * Comparsion of LOGICAL with ==/.EQ. rather than .EQV. (also .NEQV.) (PGI/Intel) * Procedure pointers in COMMON blocks (PGI/Intel) * Underindexing multi-dimensional arrays (e.g., A(1) rather than A(1,1)) (PGI only) -* Legacy PGI `NCHARACTER` type and `NC` Kanji character literals +* Legacy PGI ``NCHARACTER`` type and ``NC`` Kanji character literals * Using non-integer expressions for array bounds (e.g., REAL A(3.14159)) (PGI/Intel) * Mixing INTEGER types as operands to bit intrinsics (e.g., IAND); only two compilers support it, and they disagree on sign extension. @@ -181,7 +181,7 @@ allows a name to be a procedure from one module and a generic interface from another. * Type parameter declarations must come first in a derived type definition; - some compilers allow them to follow `PRIVATE`, or be intermixed with the + some compilers allow them to follow ``PRIVATE``\ , or be intermixed with the component declarations. * Wrong argument types in calls to specific intrinsics that have different names than the related generics. Some accepted exceptions are listed above in the allowed extensions. @@ -190,6 +190,8 @@ Preprocessing behavior ====================== + + * The preprocessor is always run, whatever the filename extension may be. * We respect Fortran comments in macro actual arguments (like GNU, Intel, NAG; unlike PGI and XLF) on the principle that macro calls should be treated diff --git a/flang/documentation/FortranForCProgrammers.md b/flang/docs/FortranForCProgrammers.rst rename from flang/documentation/FortranForCProgrammers.md rename to flang/docs/FortranForCProgrammers.rst --- a/flang/documentation/FortranForCProgrammers.md +++ b/flang/docs/FortranForCProgrammers.rst @@ -1,11 +1,3 @@ - - Fortran For C Programmers ========================= @@ -18,6 +10,8 @@ Know This At Least ------------------ + + * There have been many implementations of Fortran, often from competing vendors, and the standard language has been defined by U.S. and international standards organizations. The various editions of @@ -32,82 +26,108 @@ can be read only as discouraging their use in new code -- they'll probably always work in any serious implementation. * Fortran has two source forms, which are typically distinguished by - filename suffixes. `foo.f` is old-style "fixed-form" source, and - `foo.f90` is new-style "free-form" source. All language features + filename suffixes. ``foo.f`` is old-style "fixed-form" source, and + ``foo.f90`` is new-style "free-form" source. All language features are available in both source forms. Neither form has reserved words in the sense that C does. Spaces are not required between tokens in fixed form, and case is not significant in either form. * Variable declarations are optional by default. Variables whose - names begin with the letters `I` through `N` are implicitly - `INTEGER`, and others are implicitly `REAL`. These implicit typing + names begin with the letters ``I`` through ``N`` are implicitly + ``INTEGER``\ , and others are implicitly ``REAL``. These implicit typing rules can be changed in the source. * Fortran uses parentheses in both array references and function calls. All arrays must be declared as such; other names followed by parenthesized expressions are assumed to be function calls. -* Fortran has a _lot_ of built-in "intrinsic" functions. They are always +* Fortran has a *lot* of built-in "intrinsic" functions. They are always available without a need to declare or import them. Their names reflect the implicit typing rules, so you will encounter names that have been - modified so that they have the right type (e.g., `AIMAG` has a leading `A` - so that it's `REAL` rather than `INTEGER`). + modified so that they have the right type (e.g., ``AIMAG`` has a leading ``A`` + so that it's ``REAL`` rather than ``INTEGER``\ ). * The modern language has means for declaring types, data, and subprogram interfaces in compiled "modules", as well as legacy mechanisms for sharing data and interconnecting subprograms. A Rosetta Stone --------------- + Fortran's language standard and other documentation uses some terminology in particular ways that might be unfamiliar. -| Fortran | English | -| ------- | ------- | -| Association | Making a name refer to something else | -| Assumed | Some attribute of an argument or interface that is not known until a call is made | -| Companion processor | A C compiler | -| Component | Class member | -| Deferred | Some attribute of a variable that is not known until an allocation or assignment | -| Derived type | C++ class | -| Dummy argument | C++ reference argument | -| Final procedure | C++ destructor | -| Generic | Overloaded function, resolved by actual arguments | -| Host procedure | The subprogram that contains a nested one | -| Implied DO | There's a loop inside a statement | -| Interface | Prototype | -| Internal I/O | `sscanf` and `snprintf` | -| Intrinsic | Built-in type or function | -| Polymorphic | Dynamically typed | -| Processor | Fortran compiler | -| Rank | Number of dimensions that an array has | -| `SAVE` attribute | Statically allocated | -| Type-bound procedure | Kind of a C++ member function but not really | -| Unformatted | Raw binary | +.. list-table:: + :header-rows: 1 + + * - Fortran + - English + * - Association + - Making a name refer to something else + * - Assumed + - Some attribute of an argument or interface that is not known until a call is made + * - Companion processor + - A C compiler + * - Component + - Class member + * - Deferred + - Some attribute of a variable that is not known until an allocation or assignment + * - Derived type + - C++ class + * - Dummy argument + - C++ reference argument + * - Final procedure + - C++ destructor + * - Generic + - Overloaded function, resolved by actual arguments + * - Host procedure + - The subprogram that contains a nested one + * - Implied DO + - There's a loop inside a statement + * - Interface + - Prototype + * - Internal I/O + - ``sscanf`` and ``snprintf`` + * - Intrinsic + - Built-in type or function + * - Polymorphic + - Dynamically typed + * - Processor + - Fortran compiler + * - Rank + - Number of dimensions that an array has + * - ``SAVE`` attribute + - Statically allocated + * - Type-bound procedure + - Kind of a C++ member function but not really + * - Unformatted + - Raw binary + Data Types ---------- -There are five built-in ("intrinsic") types: `INTEGER`, `REAL`, `COMPLEX`, -`LOGICAL`, and `CHARACTER`. + +There are five built-in ("intrinsic") types: ``INTEGER``\ , ``REAL``\ , ``COMPLEX``\ , +``LOGICAL``\ , and ``CHARACTER``. They are parameterized with "kind" values, which should be treated as non-portable integer codes, although in practice today these are the byte sizes of the data. -(For `COMPLEX`, the kind type parameter value is the byte size of one of the -two `REAL` components, or half of the total size.) -The legacy `DOUBLE PRECISION` intrinsic type is an alias for a kind of `REAL` -that should be more precise, and bigger, than the default `REAL`. +(For ``COMPLEX``\ , the kind type parameter value is the byte size of one of the +two ``REAL`` components, or half of the total size.) +The legacy ``DOUBLE PRECISION`` intrinsic type is an alias for a kind of ``REAL`` +that should be more precise, and bigger, than the default ``REAL``. -`COMPLEX` is a simple structure that comprises two `REAL` components. +``COMPLEX`` is a simple structure that comprises two ``REAL`` components. -`CHARACTER` data also have length, which may or may not be known at compilation +``CHARACTER`` data also have length, which may or may not be known at compilation time. -`CHARACTER` variables are fixed-length strings and they get padded out +``CHARACTER`` variables are fixed-length strings and they get padded out with space characters when not completely assigned. User-defined ("derived") data types can be synthesized from the intrinsic -types and from previously-defined user types, much like a C `struct`. +types and from previously-defined user types, much like a C ``struct``. Derived types can be parameterized with integer values that either have to be constant at compilation time ("kind" parameters) or deferred to execution ("len" parameters). Derived types can inherit ("extend") from at most one other derived type. -They can have user-defined destructors (`FINAL` procedures). +They can have user-defined destructors (\ ``FINAL`` procedures). They can specify default initial values for their components. With some work, one can also specify a general constructor function, since Fortran allows a generic interface to have the same name as that @@ -119,13 +139,14 @@ Arrays ------ + Arrays are not types in Fortran. Being an array is a property of an object or function, not of a type. Unlike C, one cannot have an array of arrays or an array of pointers, although can can have an array of a derived type that has arrays or pointers as components. Arrays are multidimensional, and the number of dimensions is called -the _rank_ of the array. +the *rank* of the array. In storage, arrays are stored such that the last subscript has the largest stride in memory, e.g. A(1,1) is followed by A(2,1), not A(1,2). And yes, the default lower bound on each dimension is 1, not 0. @@ -135,37 +156,39 @@ Allocatables ------------ -Modern Fortran programs use `ALLOCATABLE` data extensively. + +Modern Fortran programs use ``ALLOCATABLE`` data extensively. Such variables and derived type components are allocated dynamically. They are automatically deallocated when they go out of scope, much -like C++'s `std::vector<>` class template instances are. -The array bounds, derived type `LEN` parameters, and even the +like C++'s ``std::vector<>`` class template instances are. +The array bounds, derived type ``LEN`` parameters, and even the type of an allocatable can all be deferred to run time. (If you really want to learn all about modern Fortran, I suggest -that you study everything that can be done with `ALLOCATABLE` data, +that you study everything that can be done with ``ALLOCATABLE`` data, and follow up all the references that are made in the documentation -from the description of `ALLOCATABLE` to other topics; it's a feature +from the description of ``ALLOCATABLE`` to other topics; it's a feature that interacts with much of the rest of the language.) I/O --- + Fortran's input/output features are built into the syntax of the language, rather than being defined by library interfaces as in C and C++. There are means for raw binary I/O and for "formatted" transfers to character representations. There are means for random-access I/O using fixed-size records as well as for sequential I/O. -One can scan data from or format data into `CHARACTER` variables via +One can scan data from or format data into ``CHARACTER`` variables via "internal" formatted I/O. I/O from and to files uses a scheme of integer "unit" numbers that is similar to the open file descriptors of UNIX; i.e., one opens a file and assigns it a unit number, then uses that unit number in subsequent -`READ` and `WRITE` statements. +``READ`` and ``WRITE`` statements. Formatted I/O relies on format specifications to map values to fields of -characters, similar to the format strings used with C's `printf` family +characters, similar to the format strings used with C's ``printf`` family of standard library functions. -These format specifications can appear in `FORMAT` statements and +These format specifications can appear in ``FORMAT`` statements and be referenced by their labels, in character literals directly in I/O statements, or in character variables. @@ -175,10 +198,11 @@ Subprograms ----------- -Fortran has both `FUNCTION` and `SUBROUTINE` subprograms. + +Fortran has both ``FUNCTION`` and ``SUBROUTINE`` subprograms. They share the same name space, but functions cannot be called as subroutines or vice versa. -Subroutines are called with the `CALL` statement, while functions are +Subroutines are called with the ``CALL`` statement, while functions are invoked with function references in expressions. There is one level of subprogram nesting. @@ -190,6 +214,7 @@ Modules ------- + Modern Fortran has good support for separate compilation and namespace management. The *module* is the basic unit of compilation, although independent @@ -198,7 +223,7 @@ subprograms. Objects from a module are made available for use in other compilation -units via the `USE` statement, which has options for limiting the objects +units via the ``USE`` statement, which has options for limiting the objects that are made available as well as for renaming them. All references to objects in modules are done with direct names or aliases that have been added to the local scope, as Fortran has no means @@ -206,13 +231,14 @@ Arguments --------- + Functions and subroutines have "dummy" arguments that are dynamically associated with actual arguments during calls. Essentially, all argument passing in Fortran is by reference, not value. One may restrict access to argument data by declaring that dummy -arguments have `INTENT(IN)`, but that corresponds to the use of -a `const` reference in C++ and does not imply that the data are -copied; use `VALUE` for that. +arguments have ``INTENT(IN)``\ , but that corresponds to the use of +a ``const`` reference in C++ and does not imply that the data are +copied; use ``VALUE`` for that. When it is not possible to pass a reference to an object, or a sparse regular array section of an object, as an actual argument, Fortran @@ -226,18 +252,21 @@ In other words, if some object can be written to under one name, it's never going to be read or written using some other name in that same scope. -``` - SUBROUTINE FOO(X,Y,Z) - X = 3.14159 - Y = 2.1828 - Z = 2 * X ! CAN BE FOLDED AT COMPILE TIME - END -``` + +.. code-block:: + + SUBROUTINE FOO(X,Y,Z) + X = 3.14159 + Y = 2.1828 + Z = 2 * X ! CAN BE FOLDED AT COMPILE TIME + END + This is the opposite of the assumptions under which a C or C++ compiler must labor when trying to optimize code with pointers. Overloading ----------- + Fortran supports a form of overloading via its interface feature. By default, an interface is a means for specifying prototypes for a set of subroutines and functions. @@ -252,26 +281,28 @@ Polymorphism ------------ + Fortran code can be written to accept data of some derived type or -any extension thereof using `CLASS`, deferring the actual type to -execution, rather than the usual `TYPE` syntax. -This is somewhat similar to the use of `virtual` functions in c++. +any extension thereof using ``CLASS``\ , deferring the actual type to +execution, rather than the usual ``TYPE`` syntax. +This is somewhat similar to the use of ``virtual`` functions in c++. -Fortran's `SELECT TYPE` construct is used to distinguish between +Fortran's ``SELECT TYPE`` construct is used to distinguish between possible specific types dynamically, when necessary. It's a -little like C++17's `std::visit()` on a discriminated union. +little like C++17's ``std::visit()`` on a discriminated union. Pointers -------- + Pointers are objects in Fortran, not data types. Pointers can point to data, arrays, and subprograms. -A pointer can only point to data that has the `TARGET` attribute. -Outside of the pointer assignment statement (`P=>X`) and some intrinsic +A pointer can only point to data that has the ``TARGET`` attribute. +Outside of the pointer assignment statement (\ ``P=>X``\ ) and some intrinsic functions and cases with pointer dummy arguments, pointers are implicitly dereferenced, and the use of their name is a reference to the data to which they point instead. -Unlike C, a pointer cannot point to a pointer *per se*, nor can they be +Unlike C, a pointer cannot point to a pointer *per se*\ , nor can they be used to implement a level of indirection to the management structure of an allocatable. If you assign to a Fortran pointer to make it point at another pointer, @@ -289,6 +320,7 @@ Preprocessing ------------- + There is no standard preprocessing feature, but every real Fortran implementation has some support for passing Fortran source code through a variant of the standard C source preprocessor. @@ -304,22 +336,23 @@ "Object Oriented" Programming ----------------------------- + Fortran doesn't have member functions (or subroutines) in the sense that C++ does, in which a function has immediate access to the members of a specific instance of a derived type. -But Fortran does have an analog to C++'s `this` via *type-bound +But Fortran does have an analog to C++'s ``this`` via *type-bound procedures*. This is a means of binding a particular subprogram name to a derived type, possibly with aliasing, in such a way that the subprogram can -be called as if it were a component of the type (e.g., `X%F(Y)`) -and receive the object to the left of the `%` as an additional actual argument, -exactly as if the call had been written `F(X,Y)`. +be called as if it were a component of the type (e.g., ``X%F(Y)``\ ) +and receive the object to the left of the ``%`` as an additional actual argument, +exactly as if the call had been written ``F(X,Y)``. The object is passed as the first argument by default, but that can be changed; indeed, the same specific subprogram can be used for multiple type-bound procedures by choosing different dummy arguments to serve as the passed object. -The equivalent of a `static` member function is also available by saying -that no argument is to be associated with the object via `NOPASS`. +The equivalent of a ``static`` member function is also available by saying +that no argument is to be associated with the object via ``NOPASS``. There's a lot more that can be said about type-bound procedures (e.g., how they support overloading) but this should be enough to get you started with @@ -327,45 +360,46 @@ Pitfalls -------- -Variable initializers, e.g. `INTEGER :: J=123`, are _static_ initializers! + +Variable initializers, e.g. ``INTEGER :: J=123``\ , are *static* initializers! They imply that the variable is stored in static storage, not on the stack, and the initialized value lasts only until the variable is assigned. One must use an assignment statement to implement a dynamic initializer that will apply to every fresh instance of the variable. -Be especially careful when using initializers in the newish `BLOCK` construct, +Be especially careful when using initializers in the newish ``BLOCK`` construct, which perpetuates the interpretation as static data. (Derived type component initializers, however, do work as expected.) If you see an assignment to an array that's never been declared as such, -it's probably a definition of a *statement function*, which is like -a parameterized macro definition, e.g. `A(X)=SQRT(X)**3`. +it's probably a definition of a *statement function*\ , which is like +a parameterized macro definition, e.g. ``A(X)=SQRT(X)**3``. In the original Fortran language, this was the only means for user function definitions. Today, of course, one should use an external or internal function instead. Fortran expressions don't bind exactly like C's do. -Watch out for exponentiation with `**`, which of course C lacks; it -binds more tightly than negation does (e.g., `-2**2` is -4), +Watch out for exponentiation with ``**``\ , which of course C lacks; it +binds more tightly than negation does (e.g., ``-2**2`` is -4), and it binds to the right, unlike what any other Fortran and most -C operators do; e.g., `2**2**3` is 256, not 64. +C operators do; e.g., ``2**2**3`` is 256, not 64. Logical values must be compared with special logical equivalence -relations (`.EQV.` and `.NEQV.`) rather than the usual equality +relations (\ ``.EQV.`` and ``.NEQV.``\ ) rather than the usual equality operators. A Fortran compiler is allowed to short-circuit expression evaluation, but not required to do so. -If one needs to protect a use of an `OPTIONAL` argument or possibly -disassociated pointer, use an `IF` statement, not a logical `.AND.` +If one needs to protect a use of an ``OPTIONAL`` argument or possibly +disassociated pointer, use an ``IF`` statement, not a logical ``.AND.`` operation. In fact, Fortran can remove function calls from expressions if their values are not required to determine the value of the expression's -result; e.g., if there is a `PRINT` statement in function `F`, it -may or may not be executed by the assignment statement `X=0*F()`. +result; e.g., if there is a ``PRINT`` statement in function ``F``\ , it +may or may not be executed by the assignment statement ``X=0*F()``. (Well, it probably will be, in practice, but compilers always reserve the right to optimize better.) -Unless they have an explicit suffix (`1.0_8`, `2.0_8`) or a `D` -exponent (`3.0D0`), real literal constants in Fortran have the -default `REAL` type -- *not* `double` as in the case in C and C++. +Unless they have an explicit suffix (\ ``1.0_8``\ , ``2.0_8``\ ) or a ``D`` +exponent (\ ``3.0D0``\ ), real literal constants in Fortran have the +default ``REAL`` type -- *not* ``double`` as in the case in C and C++. If you're not careful, you can lose precision at compilation time from your constant values and never know it. diff --git a/flang/documentation/FortranIR.md b/flang/docs/FortranIR.rst rename from flang/documentation/FortranIR.md rename to flang/docs/FortranIR.rst --- a/flang/documentation/FortranIR.md +++ b/flang/docs/FortranIR.rst @@ -1,61 +1,63 @@ - +Design: Fortran IR +================== -# Design: Fortran IR - -## Introduction +Introduction +------------ After semantic analysis is complete and it has been determined that the compiler has a legal Fortran program as input, the parse tree will be lowered to an intermediate representation for the purposes of high-level analysis and optimization. In this document, that intermediate representation will be called Fortran IR or FIR. The pass that converts from the parse tree and other data structures of the front-end to FIR will be called the "Burnside bridge". FIR will be an explicit, operational, and strongly-typed representation, which shall encapsulate control-flow as graphs. -## Requirements +Requirements +------------ -### White Paper: [Control Flow Graph](ControlFlowGraph.md)1 +White Paper: `Control Flow Graph `_\ :raw-html-m2r:`1` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ This is a list of requirements extracted from that document, which will be referred to as CFG-WP. -1. Control flow to be explicit (e.g. ERR= specifiers) -2. May need critical edge splitting -3. Lowering of procedures with ENTRY statements is specified -4. Procedures will have a start node -5. Non-executable statements will be ignored -6. Labeled DO loop execution with GOTO specified -7. Operations and actions (statements) are defined -8. The last statement in a basic block can represent a change in control flow -9. Scope transitions to be made explicit (special actions) -10. The IR will be in SSA form -### Explicit Control Flow +#. Control flow to be explicit (e.g. ERR= specifiers) +#. May need critical edge splitting +#. Lowering of procedures with ENTRY statements is specified +#. Procedures will have a start node +#. Non-executable statements will be ignored +#. Labeled DO loop execution with GOTO specified +#. Operations and actions (statements) are defined +#. The last statement in a basic block can represent a change in control flow +#. Scope transitions to be made explicit (special actions) +#. The IR will be in SSA form + +Explicit Control Flow +^^^^^^^^^^^^^^^^^^^^^ In Fortran, there are a number of statements that result in control flow to statements other than the one immediately subsequent. These can be sorted these into two categories: structured and unstructured. -#### Structured Control Flow +Structured Control Flow +~~~~~~~~~~~~~~~~~~~~~~~ -Fortran has executable constructs that imply three basic control flow forms. The first form is a structured loop (DO construct)2. The second form is a structured cascade of conditional branches (IF construct, IF statement,3 WHERE construct). The third form is a structured multiway branch (SELECT CASE, SELECT RANK, and SELECT TYPE constructs). The FORALL construct, while it implies a semantic model of interleaved iterations, can be modeled as a special single-entry single-exit region in FIR perhaps with backstage marker statements.4 +Fortran has executable constructs that imply three basic control flow forms. The first form is a structured loop (DO construct)\ :raw-html-m2r:`2`. The second form is a structured cascade of conditional branches (IF construct, IF statement,\ :raw-html-m2r:`3` WHERE construct). The third form is a structured multiway branch (SELECT CASE, SELECT RANK, and SELECT TYPE constructs). The FORALL construct, while it implies a semantic model of interleaved iterations, can be modeled as a special single-entry single-exit region in FIR perhaps with backstage marker statements.\ :raw-html-m2r:`4` -The CYCLE and EXIT statements interact with the above structured executable constructs by providing structured transfers of control.5 CYCLE (possibly named) is only valid in DO constructs and creates an alternate backedge in the enclosing loop. EXIT transfers control out of the enclosing (possibly named) construct, which need not be a DO construct. +The CYCLE and EXIT statements interact with the above structured executable constructs by providing structured transfers of control.\ :raw-html-m2r:`5` CYCLE (possibly named) is only valid in DO constructs and creates an alternate backedge in the enclosing loop. EXIT transfers control out of the enclosing (possibly named) construct, which need not be a DO construct. -#### Unstructured Control Flow +Unstructured Control Flow +~~~~~~~~~~~~~~~~~~~~~~~~~ Fortran also has mechanisms of transferring control between a statement and another statement with a corresponding label. The origin of these edges can be GOTO statements, computed GOTO statements, assigned GOTO statements, arithmetic IF statements, alt-return specifications, and END/EOR/ERR I/O specifiers. These statements are "unstructured" in the sense that the target of the control-flow has fewer constraints and the labelled statements must be linked to their origins. Another category of unstructured control flow are statements that terminate execution. These include RETURN, FAIL IMAGE, STOP and ERROR STOP statements. The PAUSE statement can be modeled as a call to the runtime. -### Operations +Operations +^^^^^^^^^^ The compiler's to be determined optimization passes will inform us as to the exact composition of FIR at the operations level. This details here will necessarily change, so please read them with a grain of salt. -The plan (see CFG-WP) is that statements (actions) will be a veneer model of Fortran syntactical executable constructs. Fortran statements will correspond one to one with actions. Actions will be composed of and own objects of Fortran::evaluate::GenericExprWrapper. Values of type GenericExprWrapper will have Fortran types. This implies that actions will not be in an explicit data flow representation and have optional type information.6 Initially, values will bind to symbols in a context and have an implicit use-def relation. An action statement may entail a "big step" operation with many side-effects. No semantics has been defined at this time. Actions may reference other non-executable statements from the parse tree in some to be determined manner. +The plan (see CFG-WP) is that statements (actions) will be a veneer model of Fortran syntactical executable constructs. Fortran statements will correspond one to one with actions. Actions will be composed of and own objects of Fortran::evaluate::GenericExprWrapper. Values of type GenericExprWrapper will have Fortran types. This implies that actions will not be in an explicit data flow representation and have optional type information.\ :raw-html-m2r:`6` Initially, values will bind to symbols in a context and have an implicit use-def relation. An action statement may entail a "big step" operation with many side-effects. No semantics has been defined at this time. Actions may reference other non-executable statements from the parse tree in some to be determined manner. From the CFG-WP, it is stated that the FIR will ultimately be in an SSA form. It is clear that a later pass can rewrite the values/expressions and construct a factored use-def version of the expressions. This may/should also involve expanding "big step" actions to a series of instructions and introducing typing information for all instructions. Again, the exact "lowered representation" will be informed from the requirements of the optimization passes and is presently to be determined. -### Other +Other +^^^^^ Overall project goals include becoming part of the LLVM ecosystem as well as using LLVM as a backend. @@ -69,7 +71,7 @@ Labeled DO loops are converted to non-labeled DO loops in the semantics processing. -The last statement in a basic block can represent a change in control flow. LLVM-IR and SIL7 require that basic blocks end with a terminator. FIR will also have terminators. +The last statement in a basic block can represent a change in control flow. LLVM-IR and SIL\ :raw-html-m2r:`7` require that basic blocks end with a terminator. FIR will also have terminators. The CFG-WP states that scope transitions are to be made explicit. We will cover this more below. @@ -77,53 +79,62 @@ Data objects with process lifetime will be captured indirectly by a reference to the (global) symbol table. -## Exploration +Exploration +----------- -### Construction +Construction +^^^^^^^^^^^^ -Our aim to construct a CFG where all control-flow is explicitly modeled by relations. A basic block will be a sequence of statements for which if the first statement is executed then all other statements in the basic block will also be executed, in order.8 A CFG is therefore this set of basic blocks and the control-flow relations between those blocks. +Our aim to construct a CFG where all control-flow is explicitly modeled by relations. A basic block will be a sequence of statements for which if the first statement is executed then all other statements in the basic block will also be executed, in order.\ :raw-html-m2r:`8` A CFG is therefore this set of basic blocks and the control-flow relations between those blocks. -#### Alternative: direct approach +Alternative: direct approach +~~~~~~~~~~~~~~~~~~~~~~~~~~~~ The CFG can be directly constructed by traversing the parse tree, threading contextual state, and building basic blocks along with control-flow relationships. + * Pro: Straightforward implementation when control-flow is well-structured as the contextual state parallels the syntax of the language closely. * Con: The contextual state needed can become large and difficult to manage in the presence of unstructured control-flow. For example, not every labeled statement in Fortran may be a control-flow destination. * Con: The contextual state must deal with the recursive nature of the parse tree. * Con: Complexity. Since structured constructs cohabitate with unstructured constructs, the context needs to carry information about all combinations until the basic blocks and relations are fully elaborated. -#### Alternative: linearized approach (decomposing the problem) +Alternative: linearized approach (decomposing the problem) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Instead of constructing the CFG directly from a parse tree traversal, an intermediate form can be constructed to explicitly capture the executable statements, which ones give rise to control-flow graph edge sources, and which are control-flow graph edge targets. This linearized form flattens the tree structure of the parse tree. The linearized form does not require recursive visitation of nested constructs and can be used to directly identify the entries and exits of basic blocks. While each control-flow source statement is explicit in the traversal, it can be the case that not all of the targets have been traversed yet (references to forward basic blocks), and those basic blocks will not yet have been created. These relations can be captured at the time the source is traversed, added to a to do list, and then completed when all the basic blocks for the procedure have been created. Specifically, at the point when we create a terminator all information is known to create the FIR terminator, however all basic blocks that may be referenced may not have been created. Those are resolved in one final "clean up" pass over a list of closures. + * Con: An extra representation must be defined and constructed. * Pro: This representation reifies all the information that is referred to as contextual state in the direct approach. * Pro: Constructing the linearized form can be done with a simple traversal of the parse tree. * Pro: Once composed the linearized form can be traversed and a CFG directly constructed. This greatly reduces bookkeeping of contextual state. -### Details +Details +^^^^^^^ -#### Grappling with Control Flow +Grappling with Control Flow +~~~~~~~~~~~~~~~~~~~~~~~~~~~ Above, various Fortran executable constructs were discussed with respect to how they (may) give rise to control flow. These Fortran statements are mapped to a small number of FIR statements: ReturnStmt, BranchStmt, SwitchStmt, IndirectBrStmt, and UnreachableStmt. -_ReturnStmt_: execution leaves the enclosing Procedure. A ReturnStmt can return an optional value. This would appear for RETURN statements or at END SUBROUTINE. +*ReturnStmt*\ : execution leaves the enclosing Procedure. A ReturnStmt can return an optional value. This would appear for RETURN statements or at END SUBROUTINE. -_BranchStmt_: execution of the current basic block ends. If the branch is unconditional then control transfers to exactly one successor basic block. If the branch is conditional then control transfers to exactly one of two successor blocks depending on the true/false value of the condition. All successors must be in the current Procedure. Unconditional branches would appear for GOTO statements. Conditional branches would appear for IF constructs, IF statements, etc. +*BranchStmt*\ : execution of the current basic block ends. If the branch is unconditional then control transfers to exactly one successor basic block. If the branch is conditional then control transfers to exactly one of two successor blocks depending on the true/false value of the condition. All successors must be in the current Procedure. Unconditional branches would appear for GOTO statements. Conditional branches would appear for IF constructs, IF statements, etc. -_SwitchStmt_: Exactly one of multiple successors is selected based on the control expression. Successors are pairs of case expressions and basic blocks. If the control expression compares to the case expression and returns true, then that control transfers to that block. There may be one special block, the default block, that is selected if none of the case expressions compares true. This would appear for SELECT CASE, SELECT TYPE, SELECT RANK, COMPUTED GOTO, WRITE with exceptional condition label specificers, alternate return specifiers, etc. +*SwitchStmt*\ : Exactly one of multiple successors is selected based on the control expression. Successors are pairs of case expressions and basic blocks. If the control expression compares to the case expression and returns true, then that control transfers to that block. There may be one special block, the default block, that is selected if none of the case expressions compares true. This would appear for SELECT CASE, SELECT TYPE, SELECT RANK, COMPUTED GOTO, WRITE with exceptional condition label specificers, alternate return specifiers, etc. -_IndirectBrStmt_: A variable is loaded with the address of a basic block in the containing Procedure. Control is transferred to the contents of this variable. An IndirectBrStmt also requires a complete list of potential basic blocks that may be loaded into the variable. This would appear for ASSIGNED GOTO. +*IndirectBrStmt*\ : A variable is loaded with the address of a basic block in the containing Procedure. Control is transferred to the contents of this variable. An IndirectBrStmt also requires a complete list of potential basic blocks that may be loaded into the variable. This would appear for ASSIGNED GOTO. Supporting ASSIGNED GOTO offers a little extra challenge as the ASSIGN GOTO statement's list of target labels is optional. If that list is not present, then the procedure must be analyzed to find ASSIGN statements. The implementation proactively looks for ASSIGN statements and keeps a dictionary mapping an assigned Symbol to its set of targets. When constructing the CFG, ASSIGNED GOTOs can be processed as to potential targets either from the list provided in the ASSIGNED GOTO or from the analysis pass. -Alternatively, ASSIGNED GOTO could be implemented as a _SwitchStmt_ that tests on a compiler-defined value and fully elaborates all potential target basic blocks. +Alternatively, ASSIGNED GOTO could be implemented as a *SwitchStmt* that tests on a compiler-defined value and fully elaborates all potential target basic blocks. -_UnreachableStmt_: If control reaches an unreachable statement, then an error has occurred. Calls to library routines that do not return should be followed by an UnreachableStmt. An example would be the STOP statement. +*UnreachableStmt*\ : If control reaches an unreachable statement, then an error has occurred. Calls to library routines that do not return should be followed by an UnreachableStmt. An example would be the STOP statement. -#### Scope +Scope +~~~~~ In the CFG-WP, scopes are meant to be captured by a pair of backstage statements for entering and exiting a particular scope. In structured code, these pairs would not be problematic; however, control flow in Fortran is ad hoc, particularly in legacy Fortran. In short, Fortran does not have a clean sense of structure with respect to scope. @@ -131,51 +142,61 @@ Once the basic blocks are constructed, CFG edges defined, and the CFG is simplified, a simple pass that analyzes the region bounding boxes can decorate the basic blocks with the SCOPE ENTER and SCOPE EXIT statements and flatten/remove the region structure. It will then be the burden of any optimization passes to guarantee legal orderings of SCOPE ENTER and SCOPE EXIT pairs. + * Pro: Separation of concerns allows for simpler, easier to maintain code * Pro: Simplification of the CFG can be done without worrying about SCOPE markers * Pro: Allows a precise superimposing of all Fortran constructs with scoping considerations over an otherwise ad hoc CFG. * Con: Adds "an extra layer" to FIR as compared to SIL. However, that can be mitigated/made inconsequential by a pass that flattens the Region tree and inserts the backstage SCOPE marker statements. -#### Structure +Structure +~~~~~~~~~ -_Program_: A program instance is the top-level object that contains the representation of all the code being compiled, the compilation unit. It contains a list of procedures and a reference to the global symbol table. +*Program*\ : A program instance is the top-level object that contains the representation of all the code being compiled, the compilation unit. It contains a list of procedures and a reference to the global symbol table. -_Procedure_: This is a named Fortran procedure (subroutine or function). It contains a (hierarchical) list of regions. It also owns the master list of all basic blocks for the procedure. +*Procedure*\ : This is a named Fortran procedure (subroutine or function). It contains a (hierarchical) list of regions. It also owns the master list of all basic blocks for the procedure. -_Region_: A region is owned by a procedure or by another region. A region owns a reference to a scope in the symbol table tree. The list of delineated basic blocks can also be requested from a region. +*Region*\ : A region is owned by a procedure or by another region. A region owns a reference to a scope in the symbol table tree. The list of delineated basic blocks can also be requested from a region. -_Basic block_: A basic block is owned by a procedure. A basic block owns a list of statements. The last statement in the list must be a terminator, and no other statement in the list can be a terminator. A basic block owns a list of its predecessors, which are also basic blocks. (Precisely, it is this level of FIR that is the CFG.) +*Basic block*\ : A basic block is owned by a procedure. A basic block owns a list of statements. The last statement in the list must be a terminator, and no other statement in the list can be a terminator. A basic block owns a list of its predecessors, which are also basic blocks. (Precisely, it is this level of FIR that is the CFG.) -_Statement_: An executable Fortran construct that owns/refers to expressions, symbols, scopes, etc. produced by the front-end. +*Statement*\ : An executable Fortran construct that owns/refers to expressions, symbols, scopes, etc. produced by the front-end. -_Terminator_: A statement that orchestrates control-flow. Terminator statements may reference other basic blocks and can be accessed by their parent basic block to discover successor blocks, if any. +*Terminator*\ : A statement that orchestrates control-flow. Terminator statements may reference other basic blocks and can be accessed by their parent basic block to discover successor blocks, if any. -#### Support +Support +~~~~~~~ Since there is some state that needs to be maintained and forwarded as the FIR is constructed, a FIRBuilder can be used for convenience. The FIRBuilder constructs statements and updates the CFG accordingly. To support visualization, there is a support class to dump the FIR to a dotty graph. -### Data Structures +Data Structures +^^^^^^^^^^^^^^^ FIR is intentionally similar to SIL from the statement level up to the level of a program. -#### Alternative: LLVM +Alternative: LLVM +~~~~~~~~~~~~~~~~~ Program, procedure, region, and basic block all leverage code from LLVM, in much the same way as SIL. These data structures have significant investment and engineering behind their use in compilers, and it makes sense to leverage that work. + * Pro: Uses LLVM data structures, pervasive in compiler projects such as LLVM, SIL, etc. * Pro: Get used to seeing and using LLVM, as f18 aims to be an LLVM project * Con: Uses LLVM data structures, which the project has been avoiding -#### Alternative: C++ Standard Template Library +Alternative: C++ Standard Template Library +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ Clearly, the STL can be used to maintain lists, etc. + * Pro: Keeps the number of libraries minimal * Con: The STL is general purpose and not necessarily tuned to support compiler construction -#### Alternative: Boost, Library XYZ, etc. +Alternative: Boost, Library XYZ, etc. +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + * Con: Don't see a strong motivation at present for adding another library. @@ -184,21 +205,24 @@ The operations (expressions) owned/referenced by a statement, variable references, etc. will be data structures from the Fortran::evaluate, Fortran::semantics, etc. namespaces. +.. raw:: html + +
+ -
-1 CFG paper. https://bit.ly/2q9IRaQ +:raw-html-m2r:`1` CFG paper. https://bit.ly/2q9IRaQ -2 All labeled DO sequences will have been translated to DO constructs by semantic analysis. +:raw-html-m2r:`2` All labeled DO sequences will have been translated to DO constructs by semantic analysis. -3 IF statements are handled like IF constructs with no ELSE alternatives. +:raw-html-m2r:`3` IF statements are handled like IF constructs with no ELSE alternatives. -4 In a subsequent discussion, we may want to lower FORALL constructs to semantically distinct loops or even another canonical representation. +:raw-html-m2r:`4` In a subsequent discussion, we may want to lower FORALL constructs to semantically distinct loops or even another canonical representation. -5 These statements are only valid in structured constructs and the branches are well-defined by that executable construct. +:raw-html-m2r:`5` These statements are only valid in structured constructs and the branches are well-defined by that executable construct. -6 Unlike SIL and LLVM-IR. +:raw-html-m2r:`6` Unlike SIL and LLVM-IR. -7 SIL is the Swift (high-level) intermediate language. https://bit.ly/2RHW0DQ +:raw-html-m2r:`7` SIL is the Swift (high-level) intermediate language. https://bit.ly/2RHW0DQ -8 Single-threaded semantics. +:raw-html-m2r:`8` Single-threaded semantics. diff --git a/flang/docs/IORuntimeInternals.rst b/flang/docs/IORuntimeInternals.rst new file mode 100644 --- /dev/null +++ b/flang/docs/IORuntimeInternals.rst @@ -0,0 +1,351 @@ +Fortran I/O Runtime Library Internal Design +=========================================== + +This note is meant to be an overview of the design of the *implementation* +of the f18 Fortran compiler's runtime support library for I/O statements. + +The *interface* to the I/O runtime support library is defined in the +C++ header file ``runtime/io-api.h``. +This interface was designed to minimize the amount of complexity exposed +to its clients, which are of course the sequences of calls generated by +the compiler to implement each I/O statement. +By keeping this interface as simple as possible, we hope that we have +lowered the risk of future incompatible changes that would necessitate +recompilation of Fortran codes in order to link with later versions of +the runtime library. +As one will see in ``io-api.h``\ , the interface is also directly callable +from C and C++ programs. + +The I/O facilities of the Fortran 2018 language are specified in the +language standard in its clauses 12 (I/O statements) and 13 (\ ``FORMAT``\ ). +It's a complicated collection of language features: + + +* Files can comprise *records* or *streams*. +* Records can be fixed-length or variable-length. +* Record files can be accessed sequentially or directly (random access). +* Files can be *formatted*\ , or *unformatted* raw bits. +* ``CHARACTER`` scalars and arrays can be used as if they were + fixed-length formatted sequential record files. +* Formatted I/O can be under control of a ``FORMAT`` statement + or ``FMT=`` specifier, *list-directed* with default formatting chosen + by the runtime, or ``NAMELIST``\ , in which a collection of variables + can be given a name and passed as a group to the runtime library. +* Sequential records of a file can be partially processed by one + or more *non-advancing* I/O statements and eventually completed by + another. +* ``FORMAT`` strings can manipulate the position in the current + record arbitrarily, causing re-reading or overwriting. +* Floating-point output formatting supports more rounding modes + than the IEEE standard for floating-point arithmetic. + +The Fortran I/O runtime support library is written in C++17, and +uses some C++17 standard library facilities, but it is intended +to not have any link-time dependences on the C++ runtime support +library or any LLVM libraries. +This is important because there are at least two C++ runtime support +libraries, and we don't want Fortran application builders to have to +build multiple versions of their codes; neither do we want to require +them to ship LLVM libraries along with their products. + +Consequently, dynamic memory allocation in the Fortran runtime +uses only C's ``malloc()`` and ``free()`` functions, and the few +C++ standard class templates that we instantiate in the library have been +modified with optional template arguments that override their +allocators and deallocators. + +Conversions between the many binary floating-point formats supported +by f18 and their decimal representations are performed with the same +template library of fast conversion algorithms used to interpret +floating-point values in Fortran source programs and to emit them +to module files. + +Overview of Classes +=================== + +A suite of C++ classes and class templates are composed to construct +the Fortran I/O runtime support library. +They (mostly) reside in the C++ namespace ``Fortran::runtime::io``. +They are summarized here in a bottom-up order of dependence. + +The header and C++ implementation source file names of these +classes are in the process of being vigorously rearranged and +modified; use ``grep`` or an IDE to discover these classes in +the source for now. (Sorry!) + +``Terminator`` +------------------ + +A general facility for the entire library, ``Terminator`` latches a +source program statement location in terms of an unowned pointer to +its source file path name and line number and uses them to construct +a fatal error message if needed. +It is used for both user program errors and internal runtime library crashes. + +``IoErrorHandler`` +---------------------- + +When I/O error conditions arise at runtime that the Fortran program +might have the privilege to handle itself via ``ERR=``\ , ``END=``\ , or +``EOR=`` labels and/or by an ``IOSTAT=`` variable, this subclass of +``Terminator`` is used to either latch the error indication or to crash. +It sorts out priorities in the case of multiple errors and determines +the final ``IOSTAT=`` value at the end of an I/O statement. + +``MutableModes`` +-------------------- + +Fortran's formatted I/O statements are affected by a suite of +modes that can be configured by ``OPEN`` statements, overridden by +data transfer I/O statement control lists, and further overridden +between data items with control edit descriptors in a ``FORMAT`` string. +These modes are represented with a ``MutableModes`` instance, and these +are instantiated and copied where one would expect them to be in +order to properly isolate their modifications. +The modes in force at the time each data item is processed constitute +a member of each ``DataEdit``. + +``DataEdit`` +---------------- + +Represents a single data edit descriptor from a ``FORMAT`` statement +or ``FMT=`` character value, with some hidden extensions to also +support formatting of list-directed transfers. +It holds an instance of ``MutableModes``\ , and also has a repetition +count for when an array appears as a data item in the *io-list*. +For simplicity and efficiency, each data edit descriptor is +encoded in the ``DataEdit`` as a simple capitalized character +(or two) and some optional field widths. + +``FormatControl<>`` +----------------------- + +This class template traverses a ``FORMAT`` statement's contents (or ``FMT=`` +character value) to extract data edit descriptors like ``E20.14`` to +serve each item in an I/O data transfer statement's *io-list*\ , +making callbacks to an instance of its class template argument +along the way to effect character literal output and record +positioning. +The Fortran language standard defines formatted I/O as if the ``FORMAT`` +string were driving the traversal of the data items in the *io-list*\ , +but our implementation reverses that perspective to allow a more +convenient (for the compiler) I/O runtime support library API design +in which each data item is presented to the library with a distinct +type-dependent call. + +Clients of ``FormatControl`` instantiations call its ``GetNextDataEdit()`` +member function to acquire the next data edit descriptor to be processed +from the format, and ``FinishOutput()`` to flush out any remaining +output strings or record positionings at the end of the *io-list*. + +The ``DefaultFormatControlCallbacks`` structure summarizes the API +expected by ``FormatControl`` from its class template actual arguments. + +``OpenFile`` +---------------- + +This class encapsulates all (I hope) the operating system interfaces +used to interact with the host's filesystems for operations on +external units. +Asynchronous I/O interfaces are faked for now with synchronous +operations and deferred results. + +``ConnectionState`` +----------------------- + +An active connection to an external or internal unit maintains +the common parts of its state in this subclass of ``ConnectionAttributes``. +The base class holds state that should not change during the +lifetime of the connection, while the subclass maintains state +that may change during I/O statement execution. + +``InternalDescriptorUnit`` +------------------------------ + +When I/O is being performed from/to a Fortran ``CHARACTER`` array +rather than an external file, this class manages the standard +interoperable descriptor used to access its elements as records. +It has the necessary interfaces to serve as an actual argument +to the ``FormatControl`` class template. + +``FileFrame<>`` +------------------- + +This CRTP class template isolates all of the complexity involved between +an external unit's ``OpenFile`` and the buffering requirements +imposed by the capabilities of Fortran ``FORMAT`` control edit +descriptors that allow repositioning within the current record. +Its interface enables its clients to define a "frame" (my term, +not Fortran's) that is a contiguous range of bytes that are +or may soon be in the file. +This frame is defined as a file offset and a byte size. +The ``FileFrame`` instance manages an internal circular buffer +with two essential guarantees: + + +#. The most recently requested frame is present in the buffer + and contiguous in memory. +#. Any extra data after the frame that may have been read from + the external unit will be preserved, so that it's safe to + read from a socket, pipe, or tape and not have to worry about + repositioning and rereading. + +In end-of-file situations, it's possible that a request to read +a frame may come up short. + +As a CRTP class template, ``FileFrame`` accesses the raw filesystem +facilities it needs from ``*this``. + +``ExternalFileUnit`` +------------------------ + +This class mixes in ``ConnectionState``\ , ``OpenFile``\ , and +``FileFrame`` to represent the state of an open +(or soon to be opened) external file descriptor as a Fortran +I/O unit. +It has the contextual APIs required to serve as a template actual +argument to ``FormatControl``. +And it contains a ``std::variant<>`` suitable for holding the +state of the active I/O statement in progress on the unit +(see below). + +``ExternalFileUnit`` instances reside in a ``Map`` that is allocated +as a static variable and indexed by Fortran unit number. +Static member functions ``LookUp()``\ , ``LookUpOrCrash()``\ , and ``LookUpOrCreate()`` +probe the map to convert Fortran ``UNIT=`` numbers from I/O statements +into references to active units. + +``IoStatementBase`` +----------------------- + +The subclasses of ``IoStatementBase`` each encapsulate and maintain +the state of one active Fortran I/O statement across the several +I/O runtime library API function calls it may comprise. +The subclasses handle the distinctions between internal vs. external I/O, +formatted vs. list-directed vs. unformatted I/O, input vs. output, +and so on. + +``IoStatementBase`` inherits default ``FORMAT`` processing callbacks and +an ``IoErrorHandler``. +Each of the ``IoStatementBase`` classes that pertain to formatted I/O +support the contextual callback interfaces needed by ``FormatControl``\ , +overriding the default callbacks of the base class, which crash if +called inappropriately (e.g., if a ``CLOSE`` statement somehow +passes a data item from an *io-list*\ ). + +The lifetimes of these subclasses' instances each begin with a user +program call to an I/O API routine with a name like ``BeginExternalListOutput()`` +and persist until ``EndIoStatement()`` is called. + +To reduce dynamic memory allocation, *external* I/O statements allocate +their per-statement state class instances in space reserved in the +``ExternalFileUnit`` instance. +Internal I/O statements currently use dynamic allocation, but +the I/O API supports a means whereby the code generated for the Fortran +program may supply stack space to the I/O runtime support library +for this purpose. + +``IoStatementState`` +------------------------ + +F18's Fortran I/O runtime support library defines and implements an API +that uses a sequence of function calls to implement each Fortran I/O +statement. +The state of each I/O statement in progress is maintained in some +subclass of ``IoStatementBase``\ , as noted above. +The purpose of ``IoStatementState`` is to provide generic access +to the specific state classes without recourse to C++ ``virtual`` +functions or function pointers, language features that may not be +available to us in some important execution environments. +``IoStatementState`` comprises a ``std::variant<>`` of wrapped references +to the various possibilities, and uses ``std::visit()`` to +access them as needed by the I/O API calls that process each specifier +in the I/O *control-list* and each item in the *io-list*. + +Pointers to ``IoStatementState`` instances are the ``Cookie`` type returned +in the I/O API for ``Begin...`` I/O statement calls, passed back for +the *control-list* specifiers and *io-list* data items, and consumed +by the ``EndIoStatement()`` call at the end of the statement. + +Storage for ``IoStatementState`` is reserved in ``ExternalFileUnit`` for +external I/O units, and in the various final subclasses for internal +I/O statement states otherwise. + +Since Fortran permits a ``CLOSE`` statement to reference a nonexistent +unit, the library has to treat that (expected to be rare) situation +as a weird variation of internal I/O since there's no ``ExternalFileUnit`` +available to hold its ``IoStatementBase`` subclass or ``IoStatementState``. + +A Narrative Overview Of ``PRINT *, 'HELLO, WORLD'`` +======================================================= + + +#. When the compiled Fortran program begins execution at the ``main()`` + entry point exported from its main program, it calls ``ProgramStart()`` + with its arguments and environment. +#. The generated code calls ``BeginExternalListOutput()`` to + start the sequence of calls that implement the ``PRINT`` statement. + Since the Fortran runtime I/O library has not yet been used in + this process, its data structures are initialized on this + first call, and Fortran I/O units 5 and 6 are connected with + the stadard input and output file descriptors (respectively). + The default unit code is converted to 6 and passed to + ``ExternalFileUnit::LookUpOrCrash()``\ , which returns a reference to + unit 6's instance. +#. We check that the unit was opened for formatted I/O. +#. ``ExternalFileUnit::BeginIoStatement<>()`` is called to initialize + an instance of ``ExternalListIoStatementState`` in the unit, + point to it with an ``IoStatementState``\ , and return a reference to + that object whose address will be the ``Cookie`` for this statement. +#. The generated code calls ``OutputAscii()`` with that cookie and the + address and length of the string. +#. ``OutputAscii()`` confirms that the cookie corresponds to an output + statement and determines that it's list-directed. +#. ``ListDirectedStatementState::EmitLeadingSpaceOrAdvance()`` + emits the required initial space on the new current output record + by calling ``IoStatementState::GetConnectionState()`` to locate + the connection state, determining from the record position state + that the space is necessary, and calling ``IoStatementState::Emit()`` + to cough it out. That call is redirected to ``ExternalFileUnit::Emit()``\ , + which calls ``FileFrame::WriteFrame()`` to extend + the frame of the current record and then ``memcpy()`` to fill its + first byte with the space. +#. Back in ``OutputAscii()``\ , the mutable modes and connection state + of the ``IoStatementState`` are queried to see whether we're in an + ``WRITE(UNIT=,FMT=,DELIM=)`` statement with a delimited specifier. + If we were, the library would emit the appropriate quote marks, + double up any instances of that character in the text, and split the + text over multiple records if it's long. +#. But we don't have a delimiter, so ``OutputAscii()`` just carves + up the text into record-sized chunks and emits them. There's just + one chunk for our short ``CHARACTER`` string value in this example. + It's passed to ``IoStatementState::Emit()``\ , which (as above) is + redirected to ``ExternalFileUnit::Emit()``\ , which interacts with the + frame to extend the frame and ``memcpy`` data into the buffer. +#. A flag is set in ``ListDirectedStatementState`` to remember + that the last item emitted in this list-directed output statement + was an undelimited ``CHARACTER`` value, so that if the next item is + also an undelimited ``CHARACTER``\ , no interposing space will be emitted + between them. +#. ``OutputAscii()`` return ``true`` to its caller. +#. The generated code calls ``EndIoStatement()``\ , which is redirected to + ``ExternalIoStatementState``\ 's override of that function. + As this is not a non-advancing I/O statement, ``ExternalFileUnit::AdvanceRecord()`` + is called to end the record. Since this is a sequential formatted + file, a newline is emitted. +#. If unit 6 is connected to a terminal, the buffer is flushed. + ``FileFrame::Flush()`` drives ``ExternalFileUnit::Write()`` + to push out the data in maximal contiguous chunks, dealing with any + short writes that might occur, and collecting I/O errors along the way. + This statement has no ``ERR=`` label or ``IOSTAT=`` specifier, so errors + arriving at ``IoErrorHandler::SignalErrno()`` will cause an immediate + crash. +#. ``ExternalIoStatementBase::EndIoStatement()`` is called. + It gets the final ``IOSTAT=`` value from ``IoStatementBase::EndIoStatement()``\ , + tells the ``ExternalFileUnit`` that no I/O statement remains active, and + returns the I/O status value back to the program. +#. Eventually, the program calls ``ProgramEndStatement()``\ , which + calls ``ExternalFileUnit::CloseAll()``\ , which flushes and closes all + open files. If the standard output were not a terminal, the output + would be written now with the same sequence of calls as above. +#. ``exit(EXIT_SUCCESS)``. diff --git a/flang/docs/ImplementingASemanticCheck.rst b/flang/docs/ImplementingASemanticCheck.rst new file mode 100644 --- /dev/null +++ b/flang/docs/ImplementingASemanticCheck.rst @@ -0,0 +1,865 @@ +Introduction +============ + +I recently added a semantic check to the f18 compiler front end. This document +describes my thought process and the resulting implementation. + +For more information about the compiler, start with the +`compiler overview `_. + +Problem definition +================== + +In the 2018 Fortran standard, section 11.1.7.4.3, paragraph 2, states that: + +.. code-block:: + + Except for the incrementation of the DO variable that occurs in step (3), the DO variable + shall neither be redefined nor become undefined while the DO construct is active. + +One of the ways that DO variables might be redefined is if they are passed to +functions with dummy arguments whose ``INTENT`` is ``INTENT(OUT)`` or +``INTENT(INOUT)``. I implemented this semantic check. Specifically, I changed +the compiler to emit an error message if an active DO variable was passed to a +dummy argument of a FUNCTION with INTENT(OUT). Similarly, I had the compiler +emit a warning if an active DO variable was passed to a dummy argument with +INTENT(INOUT). Previously, I had implemented similar checks for SUBROUTINE +calls. + +Creating a test +=============== + +My first step was to create a test case to cause the problem. I called it testfun.f90 and used it to check the behavior of other Fortran compilers. Here's the initial version: + +.. code-block:: fortran + + subroutine s() + Integer :: ivar, jvar + + do ivar = 1, 10 + jvar = intentOutFunc(ivar) ! Error since ivar is a DO variable + end do + + contains + function intentOutFunc(dummyArg) + integer, intent(out) :: dummyArg + integer :: intentOutFunc + + dummyArg = 216 + end function intentOutFunc + end subroutine s + +I verified that other Fortran compilers produced an error message at the point +of the call to ``intentOutFunc()``\ : + +.. code-block:: fortran + + jvar = intentOutFunc(ivar) ! Error since ivar is a DO variable + +I also used this program to produce a parse tree for the program using the command: + +.. code-block:: bash + + f18 -fdebug-dump-parse-tree -fparse-only testfun.f90 + +Here's the relevant fragment of the parse tree produced by the compiler: + +.. code-block:: + + | | ExecutionPartConstruct -> ExecutableConstruct -> DoConstruct + | | | NonLabelDoStmt + | | | | LoopControl -> LoopBounds + | | | | | Scalar -> Name = 'ivar' + | | | | | Scalar -> Expr = '1_4' + | | | | | | LiteralConstant -> IntLiteralConstant = '1' + | | | | | Scalar -> Expr = '10_4' + | | | | | | LiteralConstant -> IntLiteralConstant = '10' + | | | Block + | | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> AssignmentStmt = 'jvar=intentoutfunc(ivar)' + | | | | | Variable -> Designator -> DataRef -> Name = 'jvar' + | | | | | Expr = 'intentoutfunc(ivar)' + | | | | | | FunctionReference -> Call + | | | | | | | ProcedureDesignator -> Name = 'intentoutfunc' + | | | | | | | ActualArgSpec + | | | | | | | | ActualArg -> Expr = 'ivar' + | | | | | | | | | Designator -> DataRef -> Name = 'ivar' + | | | EndDoStmt -> + +Note that this fragment of the tree only shows four ``parser::Expr`` nodes, +but the full parse tree also contained a fifth ``parser::Expr`` node for the +constant 216 in the statement: + +.. code-block:: fortran + + dummyArg = 216 + +Analysis and implementation planning +==================================== + +I then considered what I needed to do. I needed to detect situations where an +active DO variable was passed to a dummy argument with ``INTENT(OUT)`` or +``INTENT(INOUT)``. Once I detected such a situation, I needed to produce a +message that highlighted the erroneous source code. + +Deciding where to add the code to the compiler +---------------------------------------------- + +This new semantic check would depend on several types of information -- the +parse tree, source code location information, symbols, and expressions. Thus I +needed to put my new code in a place in the compiler after the parse tree had +been created, name resolution had already happened, and expression semantic +checking had already taken place. + +Most semantic checks for statements are implemented by walking the parse tree +and performing analysis on the nodes they visit. My plan was to use this +method. The infrastructure for walking the parse tree for statement semantic +checking is implemented in the files ``lib/Semantics/semantics.cpp``. +Here's a fragment of the declaration of the framework's parse tree visitor from +``lib/Semantics/semantics.cpp``\ : + +.. code-block:: C++ + + // A parse tree visitor that calls Enter/Leave functions from each checker + // class C supplied as template parameters. Enter is called before the node's + // children are visited, Leave is called after. No two checkers may have the + // same Enter or Leave function. Each checker must be constructible from + // SemanticsContext and have BaseChecker as a virtual base class. + template class SemanticsVisitor : public virtual C... { + public: + using C::Enter...; + using C::Leave...; + using BaseChecker::Enter; + using BaseChecker::Leave; + SemanticsVisitor(SemanticsContext &context) + : C{context}..., context_{context} {} + ... + +Since FUNCTION calls are a kind of expression, I was planning to base my +implementation on the contents of ``parser::Expr`` nodes. I would need to define +either an ``Enter()`` or ``Leave()`` function whose parameter was a ``parser::Expr`` +node. Here's the declaration I put into ``lib/Semantics/check-do.h``\ : + +.. code-block:: C++ + + void Leave(const parser::Expr &); + +The ``Enter()`` functions get called at the time the node is first visited -- +that is, before its children. The ``Leave()`` function gets called after the +children are visited. For my check the visitation order didn't matter, so I +arbitrarily chose to implement the ``Leave()`` function to visit the parse tree +node. + +Since my semantic check was focused on DO CONCURRENT statements, I added it to +the file ``lib/Semantics/check-do.cpp`` where most of the semantic checking for +DO statements already lived. + +Taking advantage of prior work +------------------------------ + +When implementing a similar check for SUBROUTINE calls, I created a utility +functions in ``lib/Semantics/semantics.cpp`` to emit messages if +a symbol corresponding to an active DO variable was being potentially modified: + +.. code-block:: C++ + + void WarnDoVarRedefine(const parser::CharBlock &location, const Symbol &var); + void CheckDoVarRedefine(const parser::CharBlock &location, const Symbol &var); + +The first function is intended for dummy arguments of ``INTENT(INOUT)`` and +the second for ``INTENT(OUT)``. + +Thus I needed three pieces of +information -- + + +#. the source location of the erroneous text, +#. the ``INTENT`` of the associated dummy argument, and +#. the relevant symbol passed as the actual argument. + +The first and third are needed since they're required to call the utility +functions. The second is needed to determine whether to call them. + +Finding the source location +--------------------------- + +The source code location information that I'd need for the error message must +come from the parse tree. I looked in the file +``include/flang/Parser/parse-tree.h`` and determined that a ``struct Expr`` +contained source location information since it had the field ``CharBlock +source``. Thus, if I visited a ``parser::Expr`` node, I could get the source +location information for the associated expression. + +Determining the ``INTENT`` +------------------------------ + +I knew that I could find the ``INTENT`` of the dummy argument associated with the +actual argument from the function called ``dummyIntent()`` in the class +``evaluate::ActualArgument`` in the file ``include/flang/Evaluate/call.h``. So +if I could find an ``evaluate::ActualArgument`` in an expression, I could + determine the ``INTENT`` of the associated dummy argument. I knew that it was + valid to call ``dummyIntent()`` because the data on which ``dummyIntent()`` + depends is established during semantic processing for expressions, and the + semantic processing for expressions happens before semantic checking for DO + constructs. + +In my prior work on checking the INTENT of arguments for SUBROUTINE calls, +the parse tree held a node for the call (a ``parser::CallStmt``\ ) that contained +an ``evaluate::ProcedureRef`` node. + +.. code-block:: C++ + + struct CallStmt { + WRAPPER_CLASS_BOILERPLATE(CallStmt, Call); + mutable std::unique_ptr> + typedCall; // filled by semantics + }; + +The ``evaluate::ProcedureRef`` contains a list of ``evaluate::ActualArgument`` +nodes. I could then find the INTENT of a dummy argument from the +``evaluate::ActualArgument`` node. + +For a FUNCTION call, though, there is no similar way to get from a parse tree +node to an ``evaluate::ProcedureRef`` node. But I knew that there was an +existing framework used in DO construct semantic checking that traversed an +``evaluate::Expr`` node collecting ``semantics::Symbol`` nodes. I guessed that I'd +be able to use a similar framework to traverse an ``evaluate::Expr`` node to +find all of the ``evaluate::ActualArgument`` nodes. + +Note that the compiler has multiple types called ``Expr``. One is in the +``parser`` namespace. ``parser::Expr`` is defined in the file +``include/flang/Parser/parse-tree.h``. It represents a parsed expression that +maps directly to the source code and has fields that specify any operators in +the expression, the operands, and the source position of the expression. + +Additionally, in the namespace ``evaluate``\ , there are ``evaluate::Expr`` +template classes defined in the file ``include/flang/Evaluate/expression.h``. +These are parameterized over the various types of Fortran and constitute a +suite of strongly-typed representations of valid Fortran expressions of type +``T`` that have been fully elaborated with conversion operations and subjected to +constant folding. After an expression has undergone semantic analysis, the +field ``typedExpr`` in the ``parser::Expr`` node is filled in with a pointer that +owns an instance of ``evaluate::Expr``\ , the most general representation +of an analyzed expression. + +All of the declarations associated with both FUNCTION and SUBROUTINE calls are +in ``include/flang/Evaluate/call.h``. An ``evaluate::FunctionRef`` inherits from +an ``evaluate::ProcedureRef`` which contains the list of +``evaluate::ActualArgument`` nodes. But the relationship between an +``evaluate::FunctionRef`` node and its associated arguments is not relevant. I +only needed to find the ``evaluate::ActualArgument`` nodes in an expression. +They hold all of the information I needed. + +So my plan was to start with the ``parser::Expr`` node and extract its +associated ``evaluate::Expr`` field. I would then traverse the +``evaluate::Expr`` tree collecting all of the ``evaluate::ActualArgument`` +nodes. I would look at each of these nodes to determine the ``INTENT`` of +the associated dummy argument. + +This combination of the traversal framework and ``dummyIntent()`` would give +me the ``INTENT`` of all of the dummy arguments in a FUNCTION call. Thus, I +would have the second piece of information I needed. + +Determining if the actual argument is a variable +------------------------------------------------ + +I also guessed that I could determine if the ``evaluate::ActualArgument`` +consisted of a variable. + +Once I had a symbol for the variable, I could call one of the functions: + +.. code-block:: C++ + + void WarnDoVarRedefine(const parser::CharBlock &, const Symbol &); + void CheckDoVarRedefine(const parser::CharBlock &, const Symbol &); + +to emit the messages. + +If my plans worked out, this would give me the three pieces of information I +needed -- the source location of the erroneous text, the ``INTENT`` of the dummy +argument, and a symbol that I could use to determine whether the actual +argument was an active DO variable. + +Implementation +============== + +Adding a parse tree visitor +--------------------------- + +I started my implementation by adding a visitor for ``parser::Expr`` nodes. +Since this analysis is part of DO construct checking, I did this in +``lib/Semantics/check-do.cpp``. I added a print statement to the visitor to +verify that my new code was actually getting executed. + +In ``lib/Semantics/check-do.h``\ , I added the declaration for the visitor: + +.. code-block:: C++ + + void Leave(const parser::Expr &); + +In ``lib/Semantics/check-do.cpp``\ , I added an (almost empty) implementation: + +.. code-block:: C++ + + void DoChecker::Leave(const parser::Expr &) { + std::cout << "In Leave for parser::Expr\n"; + } + +I then built the compiler with these changes and ran it on my test program. +This time, I made sure to invoke semantic checking. Here's the command I used: + +.. code-block:: bash + + f18 -fdebug-resolve-names -fdebug-dump-parse-tree -funparse-with-symbols testfun.f90 + +This produced the output: + +.. code-block:: + + In Leave for parser::Expr + In Leave for parser::Expr + In Leave for parser::Expr + In Leave for parser::Expr + In Leave for parser::Expr + +This made sense since the parse tree contained five ``parser::Expr`` nodes. +So far, so good. Note that a ``parse::Expr`` node has a field with the +source position of the associated expression (\ ``CharBlock source``\ ). So I +now had one of the three pieces of information needed to detect and report +errors. + +Collecting the actual arguments +------------------------------- + +To get the ``INTENT`` of the dummy arguments and the ``semantics::Symbol`` associated with the +actual argument, I needed to find all of the actual arguments embedded in an +expression that contained a FUNCTION call. So my next step was to write the +framework to walk the ``evaluate::Expr`` to gather all of the +``evaluate::ActualArgument`` nodes. The code that I planned to model it on +was the existing infrastructure that collected all of the ``semantics::Symbol`` nodes from an +``evaluate::Expr``. I found this implementation in +``lib/Evaluate/tools.cpp``\ : + +.. code-block:: C++ + + struct CollectSymbolsHelper + : public SetTraverse { + using Base = SetTraverse; + CollectSymbolsHelper() : Base{*this} {} + using Base::operator(); + semantics::SymbolSet operator()(const Symbol &symbol) const { + return {symbol}; + } + }; + template semantics::SymbolSet CollectSymbols(const A &x) { + return CollectSymbolsHelper{}(x); + } + +Note that the ``CollectSymbols()`` function returns a ``semantics::Symbolset``\ , +which is declared in ``include/flang/Semantics/symbol.h``\ : + +.. code-block:: C++ + + using SymbolSet = std::set; + +This infrastructure yields a collection based on ``std::set<>``. Using an +``std::set<>`` means that if the same object is inserted twice, the +collection only gets one copy. This was the behavior that I wanted. + +Here's a sample invocation of ``CollectSymbols()`` that I found: + +.. code-block:: C++ + + if (const auto *expr{GetExpr(parsedExpr)}) { + for (const Symbol &symbol : evaluate::CollectSymbols(*expr)) { + +I noted that a ``SymbolSet`` did not actually contain an +``std::set``. This wasn't surprising since we don't want to put the +full ``semantics::Symbol`` objects into the set. Ideally, we would be able to create an +``std::set`` (a set of C++ references to symbols). But C++ doesn't +support sets that contain references. This limitation is part of the rationale +for the f18 implementation of type ``common::Reference``\ , which is defined in + ``include/flang/Common/reference.h``. + +``SymbolRef``\ , the specialization of the template ``common::Reference`` for +``semantics::Symbol``\ , is declared in the file +``include/flang/Semantics/symbol.h``\ : + +.. code-block:: C++ + + using SymbolRef = common::Reference; + +So to implement something that would collect ``evaluate::ActualArgument`` +nodes from an ``evaluate::Expr``\ , I first defined the required types +``ActualArgumentRef`` and ``ActualArgumentSet``. Since these are being +used exclusively for DO construct semantic checking (currently), I put their +definitions into ``lib/Semantics/check-do.cpp``\ : + +.. code-block:: C++ + + namespace Fortran::evaluate { + using ActualArgumentRef = common::Reference; + } + + + using ActualArgumentSet = std::set; + +Since ``ActualArgument`` is in the namespace ``evaluate``\ , I put the +definition for ``ActualArgumentRef`` in that namespace, too. + +I then modeled the code to create an ``ActualArgumentSet`` after the code to +collect a ``SymbolSet`` and put it into ``lib/Semantics/check-do.cpp``\ : + +.. code-block:: C++ + + struct CollectActualArgumentsHelper + : public evaluate::SetTraverse { + using Base = SetTraverse; + CollectActualArgumentsHelper() : Base{*this} {} + using Base::operator(); + ActualArgumentSet operator()(const evaluate::ActualArgument &arg) const { + return ActualArgumentSet{arg}; + } + }; + + template ActualArgumentSet CollectActualArguments(const A &x) { + return CollectActualArgumentsHelper{}(x); + } + + template ActualArgumentSet CollectActualArguments(const SomeExpr &); + +Unfortunately, when I tried to build this code, I got an error message saying +``std::set`` requires the ``<`` operator to be defined for its contents. +To fix this, I added a definition for ``<``. I didn't care how ``<`` was +defined, so I just used the address of the object: + +.. code-block:: C++ + + inline bool operator<(ActualArgumentRef x, ActualArgumentRef y) { + return &*x < &*y; + } + +I was surprised when this did not make the error message saying that I needed +the ``<`` operator go away. Eventually, I figured out that the definition of +the ``<`` operator needed to be in the ``evaluate`` namespace. Once I put +it there, everything compiled successfully. Here's the code that worked: + +.. code-block:: C++ + + namespace Fortran::evaluate { + using ActualArgumentRef = common::Reference; + + inline bool operator<(ActualArgumentRef x, ActualArgumentRef y) { + return &*x < &*y; + } + } + +I then modified my visitor for the parser::Expr to invoke my new collection +framework. To verify that it was actually doing something, I printed out the +number of ``evaluate::ActualArgument`` nodes that it collected. Note the +call to ``GetExpr()`` in the invocation of ``CollectActualArguments()``. I +modeled this on similar code that collected a ``SymbolSet`` described above: + +.. code-block:: C++ + + void DoChecker::Leave(const parser::Expr &parsedExpr) { + std::cout << "In Leave for parser::Expr\n"; + ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; + std::cout << "Number of arguments: " << argSet.size() << "\n"; + } + +I compiled and tested this code on my little test program. Here's the output that I got: + +.. code-block:: + + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 1 + In Leave for parser::Expr + Number of arguments: 0 + +So most of the ``parser::Expr``\ nodes contained no actual arguments, but the +fourth expression in the parse tree walk contained a single argument. This may +seem wrong since the third ``parser::Expr`` node in the file contains the +``FunctionReference`` node along with the arguments that we're gathering. +But since the tree walk function is being called upon leaving a +``parser::Expr`` node, the function visits the ``parser::Expr`` node +associated with the ``parser::ActualArg`` node before it visits the +``parser::Expr`` node associated with the ``parser::FunctionReference`` +node. + +So far, so good. + +Finding the ``INTENT`` of the dummy argument +------------------------------------------------ + +I now wanted to find the ``INTENT`` of the dummy argument associated with the +arguments in the set. As mentioned earlier, the type +``evaluate::ActualArgument`` has a member function called ``dummyIntent()`` +that gives this value. So I augmented my code to print out the ``INTENT``\ : + +.. code-block:: C++ + + void DoChecker::Leave(const parser::Expr &parsedExpr) { + std::cout << "In Leave for parser::Expr\n"; + ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; + std::cout << "Number of arguments: " << argSet.size() << "\n"; + for (const evaluate::ActualArgumentRef &argRef : argSet) { + common::Intent intent{argRef->dummyIntent()}; + switch (intent) { + case common::Intent::In: std::cout << "INTENT(IN)\n"; break; + case common::Intent::Out: std::cout << "INTENT(OUT)\n"; break; + case common::Intent::InOut: std::cout << "INTENT(INOUT)\n"; break; + default: std::cout << "default INTENT\n"; + } + } + } + +I then rebuilt my compiler and ran it on my test case. This produced the following output: + +.. code-block:: + + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 1 + INTENT(OUT) + In Leave for parser::Expr + Number of arguments: 0 + +I then modified my test case to convince myself that I was getting the correct +``INTENT`` for ``IN``\ , ``INOUT``\ , and default cases. + +So far, so good. + +Finding the symbols for arguments that are variables +---------------------------------------------------- + +The third and last piece of information I needed was to determine if a variable +was being passed as an actual argument. In such cases, I wanted to get the +symbol table node (\ ``semantics::Symbol``\ ) for the variable. My starting point was the +``evaluate::ActualArgument`` node. + +I was unsure of how to do this, so I browsed through existing code to look for +how it treated ``evaluate::ActualArgument`` objects. Since most of the code that deals with the ``evaluate`` namespace is in the lib/Evaluate directory, I looked there. I ran ``grep`` on all of the ``.cpp`` files looking for +uses of ``ActualArgument``. One of the first hits I got was in ``lib/Evaluate/call.cpp`` in the definition of ``ActualArgument::GetType()``\ : + +.. code-block:: C++ + + std::optional ActualArgument::GetType() const { + if (const Expr *expr{UnwrapExpr()}) { + return expr->GetType(); + } else if (std::holds_alternative(u_)) { + return DynamicType::AssumedType(); + } else { + return std::nullopt; + } + } + +I noted the call to ``UnwrapExpr()`` that yielded a value of +``Expr``. So I guessed that I could use this member function to +get an ``evaluate::Expr`` on which I could perform further analysis. + +I also knew that the header file ``include/flang/Evaluate/tools.h`` held many +utility functions for dealing with ``evaluate::Expr`` objects. I was hoping to +find something that would determine if an ``evaluate::Expr`` was a variable. So +I searched for ``IsVariable`` and got a hit immediately. + +.. code-block:: C++ + + template bool IsVariable(const A &x) { + if (auto known{IsVariableHelper{}(x)}) { + return *known; + } else { + return false; + } + } + +But I actually needed more than just the knowledge that an ``evaluate::Expr`` was +a variable. I needed the ``semantics::Symbol`` associated with the variable. So +I searched in ``include/flang/Evaluate/tools.h`` for functions that returned a +``semantics::Symbol``. I found the following: + +.. code-block:: C++ + + // If an expression is simply a whole symbol data designator, + // extract and return that symbol, else null. + template const Symbol *UnwrapWholeSymbolDataRef(const A &x) { + if (auto dataRef{ExtractDataRef(x)}) { + if (const SymbolRef * p{std::get_if(&dataRef->u)}) { + return &p->get(); + } + } + return nullptr; + } + +This was exactly what I wanted. DO variables must be whole symbols. So I +could try to extract a whole ``semantics::Symbol`` from the ``evaluate::Expr`` in my +``evaluate::ActualArgument``. If this extraction resulted in a ``semantics::Symbol`` +that wasn't a ``nullptr``\ , I could then conclude if it was a variable that I +could pass to existing functions that would determine if it was an active DO +variable. + +I then modified the compiler to perform the analysis that I'd guessed would +work: + +.. code-block:: C++ + + void DoChecker::Leave(const parser::Expr &parsedExpr) { + std::cout << "In Leave for parser::Expr\n"; + ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; + std::cout << "Number of arguments: " << argSet.size() << "\n"; + for (const evaluate::ActualArgumentRef &argRef : argSet) { + if (const SomeExpr * argExpr{argRef->UnwrapExpr()}) { + std::cout << "Got an unwrapped Expr\n"; + if (const Symbol * var{evaluate::UnwrapWholeSymbolDataRef(*argExpr)}) { + std::cout << "Found a whole variable: " << *var << "\n"; + } + } + common::Intent intent{argRef->dummyIntent()}; + switch (intent) { + case common::Intent::In: std::cout << "INTENT(IN)\n"; break; + case common::Intent::Out: std::cout << "INTENT(OUT)\n"; break; + case common::Intent::InOut: std::cout << "INTENT(INOUT)\n"; break; + default: std::cout << "default INTENT\n"; + } + } + } + +Note the line that prints out the symbol table entry for the variable: + +.. code-block:: C++ + + std::cout << "Found a whole variable: " << *var << "\n"; + +The compiler defines the "<<" operator for ``semantics::Symbol``\ , which is handy +for analyzing the compiler's behavior. + +Here's the result of running the modified compiler on my Fortran test case: + +.. code-block:: + + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 1 + Got an unwrapped Expr + Found a whole variable: ivar: ObjectEntity type: INTEGER(4) + INTENT(OUT) + In Leave for parser::Expr + Number of arguments: 0 + +Sweet. + +Emitting the messages +--------------------- + +At this point, using the source location information from the original +``parser::Expr``\ , I had enough information to plug into the exiting +interfaces for emitting messages for active DO variables. I modified the +compiler code accordingly: + +.. code-block:: C++ + + void DoChecker::Leave(const parser::Expr &parsedExpr) { + std::cout << "In Leave for parser::Expr\n"; + ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; + std::cout << "Number of arguments: " << argSet.size() << "\n"; + for (const evaluate::ActualArgumentRef &argRef : argSet) { + if (const SomeExpr * argExpr{argRef->UnwrapExpr()}) { + std::cout << "Got an unwrapped Expr\n"; + if (const Symbol * var{evaluate::UnwrapWholeSymbolDataRef(*argExpr)}) { + std::cout << "Found a whole variable: " << *var << "\n"; + common::Intent intent{argRef->dummyIntent()}; + switch (intent) { + case common::Intent::In: std::cout << "INTENT(IN)\n"; break; + case common::Intent::Out: + std::cout << "INTENT(OUT)\n"; + context_.CheckDoVarRedefine(parsedExpr.source, *var); + break; + case common::Intent::InOut: + std::cout << "INTENT(INOUT)\n"; + context_.WarnDoVarRedefine(parsedExpr.source, *var); + break; + default: std::cout << "default INTENT\n"; + } + } + } + } + } + +I then ran this code on my test case, and miraculously, got the following +output: + +.. code-block:: + + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 0 + In Leave for parser::Expr + Number of arguments: 1 + Got an unwrapped Expr + Found a whole variable: ivar: ObjectEntity type: INTEGER(4) + INTENT(OUT) + In Leave for parser::Expr + Number of arguments: 0 + testfun.f90:6:12: error: Cannot redefine DO variable 'ivar' + jvar = intentOutFunc(ivar) + ^^^^^^^^^^^^^^^^^^^ + testfun.f90:5:6: Enclosing DO construct + do ivar = 1, 10 + ^^^^ + +Even sweeter. + +Improving the test case +======================= + +At this point, my implementation seemed to be working. But I was concerned +about the limitations of my test case. So I augmented it to include arguments +other than ``INTENT(OUT)`` and more complex expressions. Luckily, my +augmented test did not reveal any new problems. + +Here's the test I ended up with: + +.. code-block:: Fortran + + subroutine s() + + Integer :: ivar, jvar + + ! This one is OK + do ivar = 1, 10 + jvar = intentInFunc(ivar) + end do + + ! Error for passing a DO variable to an INTENT(OUT) dummy + do ivar = 1, 10 + jvar = intentOutFunc(ivar) + end do + + ! Error for passing a DO variable to an INTENT(OUT) dummy, more complex + ! expression + do ivar = 1, 10 + jvar = 83 + intentInFunc(intentOutFunc(ivar)) + end do + + ! Warning for passing a DO variable to an INTENT(INOUT) dummy + do ivar = 1, 10 + jvar = intentInOutFunc(ivar) + end do + + contains + function intentInFunc(dummyArg) + integer, intent(in) :: dummyArg + integer :: intentInFunc + + intentInFunc = 343 + end function intentInFunc + + function intentOutFunc(dummyArg) + integer, intent(out) :: dummyArg + integer :: intentOutFunc + + dummyArg = 216 + intentOutFunc = 343 + end function intentOutFunc + + function intentInOutFunc(dummyArg) + integer, intent(inout) :: dummyArg + integer :: intentInOutFunc + + dummyArg = 216 + intentInOutFunc = 343 + end function intentInOutFunc + + end subroutine s + +Submitting the pull request +=========================== + +At this point, my implementation seemed functionally complete, so I stripped out all of the debug statements, ran ``clang-format`` on it and reviewed it +to make sure that the names were clear. Here's what I ended up with: + +.. code-block:: C++ + + void DoChecker::Leave(const parser::Expr &parsedExpr) { + ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; + for (const evaluate::ActualArgumentRef &argRef : argSet) { + if (const SomeExpr * argExpr{argRef->UnwrapExpr()}) { + if (const Symbol * var{evaluate::UnwrapWholeSymbolDataRef(*argExpr)}) { + common::Intent intent{argRef->dummyIntent()}; + switch (intent) { + case common::Intent::Out: + context_.CheckDoVarRedefine(parsedExpr.source, *var); + break; + case common::Intent::InOut: + context_.WarnDoVarRedefine(parsedExpr.source, *var); + break; + default:; // INTENT(IN) or default intent + } + } + } + } + } + +I then created a pull request to get review comments. + +Responding to pull request comments +=================================== + +I got feedback suggesting that I use an ``if`` statement rather than a +``case`` statement. Another comment reminded me that I should look at the +code I'd previously writted to do a similar check for SUBROUTINE calls to see +if there was an opportunity to share code. This examination resulted in + converting my existing code to the following pair of functions: + +.. code-block:: C++ + + static void CheckIfArgIsDoVar(const evaluate::ActualArgument &arg, + const parser::CharBlock location, SemanticsContext &context) { + common::Intent intent{arg.dummyIntent()}; + if (intent == common::Intent::Out || intent == common::Intent::InOut) { + if (const SomeExpr * argExpr{arg.UnwrapExpr()}) { + if (const Symbol * var{evaluate::UnwrapWholeSymbolDataRef(*argExpr)}) { + if (intent == common::Intent::Out) { + context.CheckDoVarRedefine(location, *var); + } else { + context.WarnDoVarRedefine(location, *var); // INTENT(INOUT) + } + } + } + } + } + + void DoChecker::Leave(const parser::Expr &parsedExpr) { + if (const SomeExpr * expr{GetExpr(parsedExpr)}) { + ActualArgumentSet argSet{CollectActualArguments(*expr)}; + for (const evaluate::ActualArgumentRef &argRef : argSet) { + CheckIfArgIsDoVar(*argRef, parsedExpr.source, context_); + } + } + } + +The function ``CheckIfArgIsDoVar()`` was shared with the checks for DO +variables being passed to SUBROUTINE calls. + +At this point, my pull request was approved, and I merged it and deleted the +associated branch. diff --git a/flang/docs/Intrinsics.rst b/flang/docs/Intrinsics.rst new file mode 100644 --- /dev/null +++ b/flang/docs/Intrinsics.rst @@ -0,0 +1,903 @@ +A categorization of standard (2018) and extended Fortran intrinsic procedures +============================================================================= + +This note attempts to group the intrinsic procedures of Fortran into categories +of functions or subroutines with similar interfaces as an aid to +comprehension beyond that which might be gained from the standard's +alphabetical list. + +A brief status of intrinsic procedure support in f18 is also given at the end. + +Few procedures are actually described here apart from their interfaces; see the +Fortran 2018 standard (section 16) for the complete story. + +Intrinsic modules are not covered here. + +General rules +------------- + + +#. The value of any intrinsic function's ``KIND`` actual argument, if present, + must be a scalar constant integer expression, of any kind, whose value + resolves to some supported kind of the function's result type. + If optional and absent, the kind of the function's result is + either the default kind of that category or to the kind of an argument + (e.g., as in ``AINT``\ ). +#. Procedures are summarized with a non-Fortran syntax for brevity. + Wherever a function has a short definition, it appears after an + equal sign as if it were a statement function. Any functions referenced + in these short summaries are intrinsic. +#. Unless stated otherwise, an actual argument may have any supported kind + of a particular intrinsic type. Sometimes a pattern variable + can appear in a description (e.g., ``REAL(k)``\ ) when the kind of an + actual argument's type must match the kind of another argument, or + determines the kind type parameter of the function result. +#. When an intrinsic type name appears without a kind (e.g., ``REAL``\ ), + it refers to the default kind of that type. Sometimes the word + ``default`` will appear for clarity. +#. The names of the dummy arguments actually matter because they can + be used as keywords for actual arguments. +#. All standard intrinsic functions are pure, even when not elemental. +#. Assumed-rank arguments may not appear as actual arguments unless + expressly permitted. +#. When an argument is described with a default value, e.g. ``KIND=KIND(0)``\ , + it is an optional argument. Optional arguments without defaults, + e.g. ``DIM`` on many transformationals, are wrapped in ``[]`` brackets + as in the Fortran standard. When an intrinsic has optional arguments + with and without default values, the arguments with default values + may appear within the brackets to preserve the order of arguments + (e.g., ``COUNT``\ ). + +Elemental intrinsic functions +============================= + +Pure elemental semantics apply to these functions, to wit: when one or more of +the actual arguments are arrays, the arguments must be conformable, and +the result is also an array. +Scalar arguments are expanded when the arguments are not all scalars. + +Elemental intrinsic functions that may have unrestricted specific procedures +---------------------------------------------------------------------------- + +When an elemental intrinsic function is documented here as having an +*unrestricted specific name*\ , that name may be passed as an actual +argument, used as the target of a procedure pointer, appear in +a generic interface, and be otherwise used as if it were an external +procedure. +An ``INTRINSIC`` statement or attribute may have to be applied to an +unrestricted specific name to enable such usage. + +When a name is being used as a specific procedure for any purpose other +than that of a called function, the specific instance of the function +that accepts and returns values of the default kinds of the intrinsic +types is used. +A Fortran ``INTERFACE`` could be written to define each of +these unrestricted specific intrinsic function names. + +Calls to dummy arguments and procedure pointers that correspond to these +specific names must pass only scalar actual argument values. + +No other intrinsic function name can be passed as an actual argument, +used as a pointer target, appear in a generic interface, or be otherwise +used except as the name of a called function. +Some of these *restricted specific intrinsic functions*\ , e.g. ``FLOAT``\ , +provide a means for invoking a corresponding generic (\ ``REAL`` in the case of ``FLOAT``\ ) +with forced argument and result kinds. +Others, viz. ``CHAR``\ , ``ICHAR``\ , ``INT``\ , ``REAL``\ , and the lexical comparisons like ``LGE``\ , +have the same name as their generic functions, and it is not clear what purpose +is accomplished by the standard by defining them as specific functions. + +Trigonometric elemental intrinsic functions, generic and (mostly) specific +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +All of these functions can be used as unrestricted specific names. + +.. code-block:: + + ACOS(REAL(k) X) -> REAL(k) + ASIN(REAL(k) X) -> REAL(k) + ATAN(REAL(k) X) -> REAL(k) + ATAN(REAL(k) Y, REAL(k) X) -> REAL(k) = ATAN2(Y, X) + ATAN2(REAL(k) Y, REAL(k) X) -> REAL(k) + COS(REAL(k) X) -> REAL(k) + COSH(REAL(k) X) -> REAL(k) + SIN(REAL(k) X) -> REAL(k) + SINH(REAL(k) X) -> REAL(k) + TAN(REAL(k) X) -> REAL(k) + TANH(REAL(k) X) -> REAL(k) + +These ``COMPLEX`` versions of some of those functions, and the +inverse hyperbolic functions, cannot be used as specific names. + +.. code-block:: + + ACOS(COMPLEX(k) X) -> COMPLEX(k) + ASIN(COMPLEX(k) X) -> COMPLEX(k) + ATAN(COMPLEX(k) X) -> COMPLEX(k) + ACOSH(REAL(k) X) -> REAL(k) + ACOSH(COMPLEX(k) X) -> COMPLEX(k) + ASINH(REAL(k) X) -> REAL(k) + ASINH(COMPLEX(k) X) -> COMPLEX(k) + ATANH(REAL(k) X) -> REAL(k) + ATANH(COMPLEX(k) X) -> COMPLEX(k) + COS(COMPLEX(k) X) -> COMPLEX(k) + COSH(COMPLEX(k) X) -> COMPLEX(k) + SIN(COMPLEX(k) X) -> COMPLEX(k) + SINH(COMPLEX(k) X) -> COMPLEX(k) + TAN(COMPLEX(k) X) -> COMPLEX(k) + TANH(COMPLEX(k) X) -> COMPLEX(k) + +Non-trigonometric elemental intrinsic functions, generic and specific +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These functions *can* be used as unrestricted specific names. + +.. code-block:: + + ABS(REAL(k) A) -> REAL(k) = SIGN(A, 0.0) + AIMAG(COMPLEX(k) Z) -> REAL(k) = Z%IM + AINT(REAL(k) A, KIND=k) -> REAL(KIND) + ANINT(REAL(k) A, KIND=k) -> REAL(KIND) + CONJG(COMPLEX(k) Z) -> COMPLEX(k) = CMPLX(Z%RE, -Z%IM) + DIM(REAL(k) X, REAL(k) Y) -> REAL(k) = X-MIN(X,Y) + DPROD(default REAL X, default REAL Y) -> DOUBLE PRECISION = DBLE(X)*DBLE(Y) + EXP(REAL(k) X) -> REAL(k) + INDEX(CHARACTER(k) STRING, CHARACTER(k) SUBSTRING, LOGICAL(any) BACK=.FALSE., KIND=KIND(0)) -> INTEGER(KIND) + LEN(CHARACTER(k,n) STRING, KIND=KIND(0)) -> INTEGER(KIND) = n + LOG(REAL(k) X) -> REAL(k) + LOG10(REAL(k) X) -> REAL(k) + MOD(INTEGER(k) A, INTEGER(k) P) -> INTEGER(k) = A-P*INT(A/P) + NINT(REAL(k) A, KIND=KIND(0)) -> INTEGER(KIND) + SIGN(REAL(k) A, REAL(k) B) -> REAL(k) + SQRT(REAL(k) X) -> REAL(k) = X ** 0.5 + +These variants, however *cannot* be used as specific names without recourse to an alias +from the following section: + +.. code-block:: + + ABS(INTEGER(k) A) -> INTEGER(k) = SIGN(A, 0) + ABS(COMPLEX(k) A) -> REAL(k) = HYPOT(A%RE, A%IM) + DIM(INTEGER(k) X, INTEGER(k) Y) -> INTEGER(k) = X-MIN(X,Y) + EXP(COMPLEX(k) X) -> COMPLEX(k) + LOG(COMPLEX(k) X) -> COMPLEX(k) + MOD(REAL(k) A, REAL(k) P) -> REAL(k) = A-P*INT(A/P) + SIGN(INTEGER(k) A, INTEGER(k) B) -> INTEGER(k) + SQRT(COMPLEX(k) X) -> COMPLEX(k) + +Unrestricted specific aliases for some elemental intrinsic functions with distinct names +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: + + ALOG(REAL X) -> REAL = LOG(X) + ALOG10(REAL X) -> REAL = LOG10(X) + AMOD(REAL A, REAL P) -> REAL = MOD(A, P) + CABS(COMPLEX A) = ABS(A) + CCOS(COMPLEX X) = COS(X) + CEXP(COMPLEX A) -> COMPLEX = EXP(A) + CLOG(COMPLEX X) -> COMPLEX = LOG(X) + CSIN(COMPLEX X) -> COMPLEX = SIN(X) + CSQRT(COMPLEX X) -> COMPLEX = SQRT(X) + CTAN(COMPLEX X) -> COMPLEX = TAN(X) + DABS(DOUBLE PRECISION A) -> DOUBLE PRECISION = ABS(A) + DACOS(DOUBLE PRECISION X) -> DOUBLE PRECISION = ACOS(X) + DASIN(DOUBLE PRECISION X) -> DOUBLE PRECISION = ASIN(X) + DATAN(DOUBLE PRECISION X) -> DOUBLE PRECISION = ATAN(X) + DATAN2(DOUBLE PRECISION Y, DOUBLE PRECISION X) -> DOUBLE PRECISION = ATAN2(Y, X) + DCOS(DOUBLE PRECISION X) -> DOUBLE PRECISION = COS(X) + DCOSH(DOUBLE PRECISION X) -> DOUBLE PRECISION = COSH(X) + DDIM(DOUBLE PRECISION X, DOUBLE PRECISION Y) -> DOUBLE PRECISION = X-MIN(X,Y) + DEXP(DOUBLE PRECISION X) -> DOUBLE PRECISION = EXP(X) + DINT(DOUBLE PRECISION A) -> DOUBLE PRECISION = AINT(A) + DLOG(DOUBLE PRECISION X) -> DOUBLE PRECISION = LOG(X) + DLOG10(DOUBLE PRECISION X) -> DOUBLE PRECISION = LOG10(X) + DMOD(DOUBLE PRECISION A, DOUBLE PRECISION P) -> DOUBLE PRECISION = MOD(A, P) + DNINT(DOUBLE PRECISION A) -> DOUBLE PRECISION = ANINT(A) + DSIGN(DOUBLE PRECISION A, DOUBLE PRECISION B) -> DOUBLE PRECISION = SIGN(A, B) + DSIN(DOUBLE PRECISION X) -> DOUBLE PRECISION = SIN(X) + DSINH(DOUBLE PRECISION X) -> DOUBLE PRECISION = SINH(X) + DSQRT(DOUBLE PRECISION X) -> DOUBLE PRECISION = SQRT(X) + DTAN(DOUBLE PRECISION X) -> DOUBLE PRECISION = TAN(X) + DTANH(DOUBLE PRECISION X) -> DOUBLE PRECISION = TANH(X) + IABS(INTEGER A) -> INTEGER = ABS(A) + IDIM(INTEGER X, INTEGER Y) -> INTEGER = X-MIN(X,Y) + IDNINT(DOUBLE PRECISION A) -> INTEGER = NINT(A) + ISIGN(INTEGER A, INTEGER B) -> INTEGER = SIGN(A, B) + +Generic elemental intrinsic functions without specific names +------------------------------------------------------------ + +(No procedures after this point can be passed as actual arguments, used as +pointer targets, or appear as specific procedures in generic interfaces.) + +Elemental conversions +^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: + + ACHAR(INTEGER(k) I, KIND=KIND('')) -> CHARACTER(KIND,LEN=1) + CEILING(REAL() A, KIND=KIND(0)) -> INTEGER(KIND) + CHAR(INTEGER(any) I, KIND=KIND('')) -> CHARACTER(KIND,LEN=1) + CMPLX(COMPLEX(k) X, KIND=KIND(0.0D0)) -> COMPLEX(KIND) + CMPLX(INTEGER or REAL or BOZ X, INTEGER or REAL or BOZ Y=0, KIND=KIND((0,0))) -> COMPLEX(KIND) + DBLE(INTEGER or REAL or COMPLEX or BOZ A) = REAL(A, KIND=KIND(0.0D0)) + EXPONENT(REAL(any) X) -> default INTEGER + FLOOR(REAL(any) A, KIND=KIND(0)) -> INTEGER(KIND) + IACHAR(CHARACTER(KIND=k,LEN=1) C, KIND=KIND(0)) -> INTEGER(KIND) + ICHAR(CHARACTER(KIND=k,LEN=1) C, KIND=KIND(0)) -> INTEGER(KIND) + INT(INTEGER or REAL or COMPLEX or BOZ A, KIND=KIND(0)) -> INTEGER(KIND) + LOGICAL(LOGICAL(any) L, KIND=KIND(.TRUE.)) -> LOGICAL(KIND) + REAL(INTEGER or REAL or COMPLEX or BOZ A, KIND=KIND(0.0)) -> REAL(KIND) + +Other generic elemental intrinsic functions without specific names +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +N.B. ``BESSEL_JN(N1, N2, X)`` and ``BESSEL_YN(N1, N2, X)`` are categorized +below with the *transformational* intrinsic functions. + +.. code-block:: + + BESSEL_J0(REAL(k) X) -> REAL(k) + BESSEL_J1(REAL(k) X) -> REAL(k) + BESSEL_JN(INTEGER(n) N, REAL(k) X) -> REAL(k) + BESSEL_Y0(REAL(k) X) -> REAL(k) + BESSEL_Y1(REAL(k) X) -> REAL(k) + BESSEL_YN(INTEGER(n) N, REAL(k) X) -> REAL(k) + ERF(REAL(k) X) -> REAL(k) + ERFC(REAL(k) X) -> REAL(k) + ERFC_SCALED(REAL(k) X) -> REAL(k) + FRACTION(REAL(k) X) -> REAL(k) + GAMMA(REAL(k) X) -> REAL(k) + HYPOT(REAL(k) X, REAL(k) Y) -> REAL(k) = SQRT(X*X+Y*Y) without spurious overflow + IMAGE_STATUS(INTEGER(any) IMAGE [, scalar TEAM_TYPE TEAM ]) -> default INTEGER + IS_IOSTAT_END(INTEGER(any) I) -> default LOGICAL + IS_IOSTAT_EOR(INTEGER(any) I) -> default LOGICAL + LOG_GAMMA(REAL(k) X) -> REAL(k) + MAX(INTEGER(k) ...) -> INTEGER(k) + MAX(REAL(k) ...) -> REAL(k) + MAX(CHARACTER(KIND=k) ...) -> CHARACTER(KIND=k,LEN=MAX(LEN(...))) + MERGE(any type TSOURCE, same type FSOURCE, LOGICAL(any) MASK) -> type of FSOURCE + MIN(INTEGER(k) ...) -> INTEGER(k) + MIN(REAL(k) ...) -> REAL(k) + MIN(CHARACTER(KIND=k) ...) -> CHARACTER(KIND=k,LEN=MAX(LEN(...))) + MODULO(INTEGER(k) A, INTEGER(k) P) -> INTEGER(k); P*result >= 0 + MODULO(REAL(k) A, REAL(k) P) -> REAL(k) = A - P*FLOOR(A/P) + NEAREST(REAL(k) X, REAL(any) S) -> REAL(k) + OUT_OF_RANGE(INTEGER(any) X, scalar INTEGER or REAL(k) MOLD) -> default LOGICAL + OUT_OF_RANGE(REAL(any) X, scalar REAL(k) MOLD) -> default LOGICAL + OUT_OF_RANGE(REAL(any) X, scalar INTEGER(any) MOLD, scalar LOGICAL(any) ROUND=.FALSE.) -> default LOGICAL + RRSPACING(REAL(k) X) -> REAL(k) + SCALE(REAL(k) X, INTEGER(any) I) -> REAL(k) + SET_EXPONENT(REAL(k) X, INTEGER(any) I) -> REAL(k) + SPACING(REAL(k) X) -> REAL(k) + +Restricted specific aliases for elemental conversions &/or extrema with default intrinsic types +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: + + AMAX0(INTEGER ...) = REAL(MAX(...)) + AMAX1(REAL ...) = MAX(...) + AMIN0(INTEGER...) = REAL(MIN(...)) + AMIN1(REAL ...) = MIN(...) + DMAX1(DOUBLE PRECISION ...) = MAX(...) + DMIN1(DOUBLE PRECISION ...) = MIN(...) + FLOAT(INTEGER I) = REAL(I) + IDINT(DOUBLE PRECISION A) = INT(A) + IFIX(REAL A) = INT(A) + MAX0(INTEGER ...) = MAX(...) + MAX1(REAL ...) = INT(MAX(...)) + MIN0(INTEGER ...) = MIN(...) + MIN1(REAL ...) = INT(MIN(...)) + SNGL(DOUBLE PRECISION A) = REAL(A) + +Generic elemental bit manipulation intrinsic functions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Many of these accept a typeless "BOZ" literal as an actual argument. +It is interpreted as having the kind of intrinsic ``INTEGER`` type +as another argument, as if the typeless were implicitly wrapped +in a call to ``INT()``. +When multiple arguments can be either ``INTEGER`` values or typeless +constants, it is forbidden for *all* of them to be typeless +constants if the result of the function is ``INTEGER`` +(i.e., only ``BGE``\ , ``BGT``\ , ``BLE``\ , and ``BLT`` can have multiple +typeless arguments). + +.. code-block:: + + BGE(INTEGER(n1) or BOZ I, INTEGER(n2) or BOZ J) -> default LOGICAL + BGT(INTEGER(n1) or BOZ I, INTEGER(n2) or BOZ J) -> default LOGICAL + BLE(INTEGER(n1) or BOZ I, INTEGER(n2) or BOZ J) -> default LOGICAL + BLT(INTEGER(n1) or BOZ I, INTEGER(n2) or BOZ J) -> default LOGICAL + BTEST(INTEGER(n1) I, INTEGER(n2) POS) -> default LOGICAL + DSHIFTL(INTEGER(k) I, INTEGER(k) or BOZ J, INTEGER(any) SHIFT) -> INTEGER(k) + DSHIFTL(BOZ I, INTEGER(k), INTEGER(any) SHIFT) -> INTEGER(k) + DSHIFTR(INTEGER(k) I, INTEGER(k) or BOZ J, INTEGER(any) SHIFT) -> INTEGER(k) + DSHIFTR(BOZ I, INTEGER(k), INTEGER(any) SHIFT) -> INTEGER(k) + IAND(INTEGER(k) I, INTEGER(k) or BOZ J) -> INTEGER(k) + IAND(BOZ I, INTEGER(k) J) -> INTEGER(k) + IBCLR(INTEGER(k) I, INTEGER(any) POS) -> INTEGER(k) + IBITS(INTEGER(k) I, INTEGER(n1) POS, INTEGER(n2) LEN) -> INTEGER(k) + IBSET(INTEGER(k) I, INTEGER(any) POS) -> INTEGER(k) + IEOR(INTEGER(k) I, INTEGER(k) or BOZ J) -> INTEGER(k) + IEOR(BOZ I, INTEGER(k) J) -> INTEGER(k) + IOR(INTEGER(k) I, INTEGER(k) or BOZ J) -> INTEGER(k) + IOR(BOZ I, INTEGER(k) J) -> INTEGER(k) + ISHFT(INTEGER(k) I, INTEGER(any) SHIFT) -> INTEGER(k) + ISHFTC(INTEGER(k) I, INTEGER(n1) SHIFT, INTEGER(n2) SIZE=BIT_SIZE(I)) -> INTEGER(k) + LEADZ(INTEGER(any) I) -> default INTEGER + MASKL(INTEGER(any) I, KIND=KIND(0)) -> INTEGER(KIND) + MASKR(INTEGER(any) I, KIND=KIND(0)) -> INTEGER(KIND) + MERGE_BITS(INTEGER(k) I, INTEGER(k) or BOZ J, INTEGER(k) or BOZ MASK) = IOR(IAND(I,MASK),IAND(J,NOT(MASK))) + MERGE_BITS(BOZ I, INTEGER(k) J, INTEGER(k) or BOZ MASK) = IOR(IAND(I,MASK),IAND(J,NOT(MASK))) + NOT(INTEGER(k) I) -> INTEGER(k) + POPCNT(INTEGER(any) I) -> default INTEGER + POPPAR(INTEGER(any) I) -> default INTEGER = IAND(POPCNT(I), Z'1') + SHIFTA(INTEGER(k) I, INTEGER(any) SHIFT) -> INTEGER(k) + SHIFTL(INTEGER(k) I, INTEGER(any) SHIFT) -> INTEGER(k) + SHIFTR(INTEGER(k) I, INTEGER(any) SHIFT) -> INTEGER(k) + TRAILZ(INTEGER(any) I) -> default INTEGER + +Character elemental intrinsic functions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +See also ``INDEX`` and ``LEN`` above among the elemental intrinsic functions with +unrestricted specific names. + +.. code-block:: + + ADJUSTL(CHARACTER(k,LEN=n) STRING) -> CHARACTER(k,LEN=n) + ADJUSTR(CHARACTER(k,LEN=n) STRING) -> CHARACTER(k,LEN=n) + LEN_TRIM(CHARACTER(k,n) STRING, KIND=KIND(0)) -> INTEGER(KIND) = n + LGE(CHARACTER(k,n1) STRING_A, CHARACTER(k,n2) STRING_B) -> default LOGICAL + LGT(CHARACTER(k,n1) STRING_A, CHARACTER(k,n2) STRING_B) -> default LOGICAL + LLE(CHARACTER(k,n1) STRING_A, CHARACTER(k,n2) STRING_B) -> default LOGICAL + LLT(CHARACTER(k,n1) STRING_A, CHARACTER(k,n2) STRING_B) -> default LOGICAL + SCAN(CHARACTER(k,n) STRING, CHARACTER(k,m) SET, LOGICAL(any) BACK=.FALSE., KIND=KIND(0)) -> INTEGER(KIND) + VERIFY(CHARACTER(k,n) STRING, CHARACTER(k,m) SET, LOGICAL(any) BACK=.FALSE., KIND=KIND(0)) -> INTEGER(KIND) + +``SCAN`` returns the index of the first (or last, if ``BACK=.TRUE.``\ ) character in ``STRING`` +that is present in ``SET``\ , or zero if none is. + +``VERIFY`` is essentially the opposite: it returns the index of the first (or last) character +in ``STRING`` that is *not* present in ``SET``\ , or zero if all are. + +Transformational intrinsic functions +==================================== + +This category comprises a large collection of intrinsic functions that +are collected together because they somehow transform their arguments +in a way that prevents them from being elemental. +All of them are pure, however. + +Some general rules apply to the transformational intrinsic functions: + + +#. ``DIM`` arguments are optional; if present, the actual argument must be + a scalar integer of any kind. +#. When an optional ``DIM`` argument is absent, or an ``ARRAY`` or ``MASK`` + argument is a vector, the result of the function is scalar; otherwise, + the result is an array of the same shape as the ``ARRAY`` or ``MASK`` + argument with the dimension ``DIM`` removed from the shape. +#. When a function takes an optional ``MASK`` argument, it must be conformable + with its ``ARRAY`` argument if it is present, and the mask can be any kind + of ``LOGICAL``. It can be scalar. +#. The type ``numeric`` here can be any kind of ``INTEGER``\ , ``REAL``\ , or ``COMPLEX``. +#. The type ``relational`` here can be any kind of ``INTEGER``\ , ``REAL``\ , or ``CHARACTER``. +#. The type ``any`` here denotes any intrinsic or derived type. +#. The notation ``(..)`` denotes an array of any rank (but not an assumed-rank array). + +Logical reduction transformational intrinsic functions +------------------------------------------------------ + +.. code-block:: + + ALL(LOGICAL(k) MASK(..) [, DIM ]) -> LOGICAL(k) + ANY(LOGICAL(k) MASK(..) [, DIM ]) -> LOGICAL(k) + COUNT(LOGICAL(any) MASK(..) [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) + PARITY(LOGICAL(k) MASK(..) [, DIM ]) -> LOGICAL(k) + +Numeric reduction transformational intrinsic functions +------------------------------------------------------ + +.. code-block:: + + IALL(INTEGER(k) ARRAY(..) [, DIM, MASK ]) -> INTEGER(k) + IANY(INTEGER(k) ARRAY(..) [, DIM, MASK ]) -> INTEGER(k) + IPARITY(INTEGER(k) ARRAY(..) [, DIM, MASK ]) -> INTEGER(k) + NORM2(REAL(k) X(..) [, DIM ]) -> REAL(k) + PRODUCT(numeric ARRAY(..) [, DIM, MASK ]) -> numeric + SUM(numeric ARRAY(..) [, DIM, MASK ]) -> numeric + +``NORM2`` generalizes ``HYPOT`` by computing ``SQRT(SUM(X*X))`` while avoiding spurious overflows. + +Extrema reduction transformational intrinsic functions +------------------------------------------------------ + +.. code-block:: + + MAXVAL(relational(k) ARRAY(..) [, DIM, MASK ]) -> relational(k) + MINVAL(relational(k) ARRAY(..) [, DIM, MASK ]) -> relational(k) + +Locational transformational intrinsic functions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +When the optional ``DIM`` argument is absent, the result is an ``INTEGER(KIND)`` +vector whose length is the rank of ``ARRAY``. +When the optional ``DIM`` argument is present, the result is an ``INTEGER(KIND)`` +array of rank ``RANK(ARRAY)-1`` and shape equal to that of ``ARRAY`` with +the dimension ``DIM`` removed. + +The optional ``BACK`` argument is a scalar LOGICAL value of any kind. +When present and ``.TRUE.``\ , it causes the function to return the index +of the *last* occurence of the target or extreme value. + +For ``FINDLOC``\ , ``ARRAY`` may have any of the five intrinsic types, and ``VALUE`` +must a scalar value of a type for which ``ARRAY==VALUE`` or ``ARRAY .EQV. VALUE`` +is an acceptable expression. + +.. code-block:: + + FINDLOC(intrinsic ARRAY(..), scalar VALUE [, DIM, MASK, KIND=KIND(0), BACK=.FALSE. ]) + MAXLOC(relational ARRAY(..) [, DIM, MASK, KIND=KIND(0), BACK=.FALSE. ]) + MINLOC(relational ARRAY(..) [, DIM, MASK, KIND=KIND(0), BACK=.FALSE. ]) + +Data rearrangement transformational intrinsic functions +------------------------------------------------------- + +The optional ``DIM`` argument to these functions must be a scalar integer of +any kind, and it takes a default value of 1 when absent. + +.. code-block:: + + CSHIFT(any ARRAY(..), INTEGER(any) SHIFT(..) [, DIM ]) -> same type/kind/shape as ARRAY + +Either ``SHIFT`` is scalar or ``RANK(SHIFT) == RANK(ARRAY) - 1`` and ``SHAPE(SHIFT)`` is that of ``SHAPE(ARRAY)`` with element ``DIM`` removed. + +.. code-block:: + + EOSHIFT(any ARRAY(..), INTEGER(any) SHIFT(..) [, BOUNDARY, DIM ]) -> same type/kind/shape as ARRAY + + +* ``SHIFT`` is scalar or ``RANK(SHIFT) == RANK(ARRAY) - 1`` and ``SHAPE(SHIFT)`` is that of ``SHAPE(ARRAY)`` with element ``DIM`` removed. +* If ``BOUNDARY`` is present, it must have the same type and parameters as ``ARRAY``. +* If ``BOUNDARY`` is absent, ``ARRAY`` must be of an intrinsic type, and the default ``BOUNDARY`` is the obvious ``0``\ , ``' '``\ , or ``.FALSE.`` value of ``KIND(ARRAY)``. +* If ``BOUNDARY`` is present, either it is scalar, or ``RANK(BOUNDARY) == RANK(ARRAY) - 1`` and ``SHAPE(BOUNDARY)`` is that of ``SHAPE(ARRAY)`` with element ``DIM`` + removed. + +.. code-block:: + + PACK(any ARRAY(..), LOGICAL(any) MASK(..)) -> vector of same type and kind as ARRAY + + +* ``MASK`` is conformable with ``ARRAY`` and may be scalar. +* The length of the result vector is ``COUNT(MASK)`` if ``MASK`` is an array, else ``SIZE(ARRAY)`` if ``MASK`` is ``.TRUE.``\ , else zero. + +.. code-block:: + + PACK(any ARRAY(..), LOGICAL(any) MASK(..), any VECTOR(n)) -> vector of same type, kind, and size as VECTOR + + +* ``MASK`` is conformable with ``ARRAY`` and may be scalar. +* ``VECTOR`` has the same type and kind as ``ARRAY``. +* ``VECTOR`` must not be smaller than result of ``PACK`` with no ``VECTOR`` argument. +* The leading elements of ``VECTOR`` are replaced with elements from ``ARRAY`` as + if ``PACK`` had been invoked without ``VECTOR``. + +.. code-block:: + + RESHAPE(any SOURCE(..), INTEGER(k) SHAPE(n) [, PAD(..), INTEGER(k2) ORDER(n) ]) -> SOURCE array with shape SHAPE + + +* If ``ORDER`` is present, it is a vector of the same size as ``SHAPE``\ , and + contains a permutation. +* The element(s) of ``PAD`` are used to fill out the result once ``SOURCE`` + has been consumed. + +.. code-block:: + + SPREAD(any SOURCE, DIM, scalar INTEGER(any) NCOPIES) -> same type as SOURCE, rank=RANK(SOURCE)+1 + TRANSFER(any SOURCE, any MOLD) -> scalar if MOLD is scalar, else vector; same type and kind as MOLD + TRANSFER(any SOURCE, any MOLD, scalar INTEGER(any) SIZE) -> vector(SIZE) of type and kind of MOLD + TRANSPOSE(any MATRIX(n,m)) -> matrix(m,n) of same type and kind as MATRIX + +The shape of the result of ``SPREAD`` is the same as that of ``SOURCE``\ , with ``NCOPIES`` inserted +at position ``DIM``. + +.. code-block:: + + UNPACK(any VECTOR(n), LOGICAL(any) MASK(..), FIELD) -> type and kind of VECTOR, shape of MASK + +``FIELD`` has same type and kind as ``VECTOR`` and is conformable with ``MASK``. + +Other transformational intrinsic functions +------------------------------------------ + +.. code-block:: + + BESSEL_JN(INTEGER(n1) N1, INTEGER(n2) N2, REAL(k) X) -> REAL(k) vector (MAX(N2-N1+1,0)) + BESSEL_YN(INTEGER(n1) N1, INTEGER(n2) N2, REAL(k) X) -> REAL(k) vector (MAX(N2-N1+1,0)) + COMMAND_ARGUMENT_COUNT() -> scalar default INTEGER + DOT_PRODUCT(LOGICAL(k) VECTOR_A(n), LOGICAL(k) VECTOR_B(n)) -> LOGICAL(k) = ANY(VECTOR_A .AND. VECTOR_B) + DOT_PRODUCT(COMPLEX(any) VECTOR_A(n), numeric VECTOR_B(n)) = SUM(CONJG(VECTOR_A) * VECTOR_B) + DOT_PRODUCT(INTEGER(any) or REAL(any) VECTOR_A(n), numeric VECTOR_B(n)) = SUM(VECTOR_A * VECTOR_B) + MATMUL(numeric ARRAY_A(j), numeric ARRAY_B(j,k)) -> numeric vector(k) + MATMUL(numeric ARRAY_A(j,k), numeric ARRAY_B(k)) -> numeric vector(j) + MATMUL(numeric ARRAY_A(j,k), numeric ARRAY_B(k,m)) -> numeric matrix(j,m) + MATMUL(LOGICAL(n1) ARRAY_A(j), LOGICAL(n2) ARRAY_B(j,k)) -> LOGICAL vector(k) + MATMUL(LOGICAL(n1) ARRAY_A(j,k), LOGICAL(n2) ARRAY_B(k)) -> LOGICAL vector(j) + MATMUL(LOGICAL(n1) ARRAY_A(j,k), LOGICAL(n2) ARRAY_B(k,m)) -> LOGICAL matrix(j,m) + NULL([POINTER/ALLOCATABLE MOLD]) -> POINTER + REDUCE(any ARRAY(..), function OPERATION [, DIM, LOGICAL(any) MASK(..), IDENTITY, LOGICAL ORDERED=.FALSE. ]) + REPEAT(CHARACTER(k,n) STRING, INTEGER(any) NCOPIES) -> CHARACTER(k,n*NCOPIES) + SELECTED_CHAR_KIND('DEFAULT' or 'ASCII' or 'ISO_10646' or ...) -> scalar default INTEGER + SELECTED_INT_KIND(scalar INTEGER(any) R) -> scalar default INTEGER + SELECTED_REAL_KIND([scalar INTEGER(any) P, scalar INTEGER(any) R, scalar INTEGER(any) RADIX]) -> scalar default INTEGER + SHAPE(SOURCE, KIND=KIND(0)) -> INTEGER(KIND)(RANK(SOURCE)) + TRIM(CHARACTER(k,n) STRING) -> CHARACTER(k) + +The type and kind of the result of a numeric ``MATMUL`` is the same as would result from +a multiplication of an element of ARRAY_A and an element of ARRAY_B. + +The kind of the ``LOGICAL`` result of a ``LOGICAL`` ``MATMUL`` is the same as would result +from an intrinsic ``.AND.`` operation between an element of ``ARRAY_A`` and an element +of ``ARRAY_B``. + +Note that ``DOT_PRODUCT`` with a ``COMPLEX`` first argument operates on its complex conjugate, +but that ``MATMUL`` with a ``COMPLEX`` argument does not. + +The ``MOLD`` argument to ``NULL`` may be omitted only in a context where the type of the pointer is known, +such as an initializer or pointer assignment statement. + +At least one argument must be present in a call to ``SELECTED_REAL_KIND``. + +An assumed-rank array may be passed to ``SHAPE``\ , and if it is associated with an assumed-size array, +the last element of the result will be -1. + +Coarray transformational intrinsic functions +-------------------------------------------- + +.. code-block:: + + FAILED_IMAGES([scalar TEAM_TYPE TEAM, KIND=KIND(0)]) -> INTEGER(KIND) vector + GET_TEAM([scalar INTEGER(?) LEVEL]) -> scalar TEAM_TYPE + IMAGE_INDEX(COARRAY, INTEGER(any) SUB(n) [, scalar TEAM_TYPE TEAM ]) -> scalar default INTEGER + IMAGE_INDEX(COARRAY, INTEGER(any) SUB(n), scalar INTEGER(any) TEAM_NUMBER) -> scalar default INTEGER + NUM_IMAGES([scalar TEAM_TYPE TEAM]) -> scalar default INTEGER + NUM_IMAGES(scalar INTEGER(any) TEAM_NUMBER) -> scalar default INTEGER + STOPPED_IMAGES([scalar TEAM_TYPE TEAM, KIND=KIND(0)]) -> INTEGER(KIND) vector + TEAM_NUMBER([scalar TEAM_TYPE TEAM]) -> scalar default INTEGER + THIS_IMAGE([COARRAY, DIM, scalar TEAM_TYPE TEAM]) -> default INTEGER + +The result of ``THIS_IMAGE`` is a scalar if ``DIM`` is present or if ``COARRAY`` is absent, +and a vector whose length is the corank of ``COARRAY`` otherwise. + +Inquiry intrinsic functions +=========================== + +These are neither elemental nor transformational; all are pure. + +Type inquiry intrinsic functions +-------------------------------- + +All of these functions return constants. +The value of the argument is not used, and may well be undefined. + +.. code-block:: + + BIT_SIZE(INTEGER(k) I(..)) -> INTEGER(k) + DIGITS(INTEGER or REAL X(..)) -> scalar default INTEGER + EPSILON(REAL(k) X(..)) -> scalar REAL(k) + HUGE(INTEGER(k) X(..)) -> scalar INTEGER(k) + HUGE(REAL(k) X(..)) -> scalar of REAL(k) + KIND(intrinsic X(..)) -> scalar default INTEGER + MAXEXPONENT(REAL(k) X(..)) -> scalar default INTEGER + MINEXPONENT(REAL(k) X(..)) -> scalar default INTEGER + NEW_LINE(CHARACTER(k,n) A(..)) -> scalar CHARACTER(k,1) = CHAR(10) + PRECISION(REAL(k) or COMPLEX(k) X(..)) -> scalar default INTEGER + RADIX(INTEGER(k) or REAL(k) X(..)) -> scalar default INTEGER, always 2 + RANGE(INTEGER(k) or REAL(k) or COMPLEX(k) X(..)) -> scalar default INTEGER + TINY(REAL(k) X(..)) -> scalar REAL(k) + +Bound and size inquiry intrinsic functions +------------------------------------------ + +The results are scalar when ``DIM`` is present, and a vector of length=(co)rank(\ ``(CO)ARRAY``\ ) +when ``DIM`` is absent. + +.. code-block:: + + LBOUND(any ARRAY(..) [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) + LCOBOUND(any COARRAY [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) + SIZE(any ARRAY(..) [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) + UBOUND(any ARRAY(..) [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) + UCOBOUND(any COARRAY [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) + +Assumed-rank arrays may be used with ``LBOUND``\ , ``SIZE``\ , and ``UBOUND``. + +Object characteristic inquiry intrinsic functions +------------------------------------------------- + +.. code-block:: + + ALLOCATED(any type ALLOCATABLE ARRAY) -> scalar default LOGICAL + ALLOCATED(any type ALLOCATABLE SCALAR) -> scalar default LOGICAL + ASSOCIATED(any type POINTER POINTER [, same type TARGET]) -> scalar default LOGICAL + COSHAPE(COARRAY, KIND=KIND(0)) -> INTEGER(KIND) vector of length corank(COARRAY) + EXTENDS_TYPE_OF(A, MOLD) -> default LOGICAL + IS_CONTIGUOUS(any data ARRAY(..)) -> scalar default LOGICAL + PRESENT(OPTIONAL A) -> scalar default LOGICAL + RANK(any data A) -> scalar default INTEGER = 0 if A is scalar, SIZE(SHAPE(A)) if A is an array, rank if assumed-rank + SAME_TYPE_AS(A, B) -> scalar default LOGICAL + STORAGE_SIZE(any data A, KIND=KIND(0)) -> INTEGER(KIND) + +The arguments to ``EXTENDS_TYPE_OF`` must be of extensible derived types or be unlimited polymorphic. + +An assumed-rank array may be used with ``IS_CONTIGUOUS`` and ``RANK``. + +Intrinsic subroutines +===================== + +(\ *TODO*\ : complete these descriptions) + +One elemental intrinsic subroutine +---------------------------------- + +.. code-block:: + + INTERFACE + SUBROUTINE MVBITS(FROM, FROMPOS, LEN, TO, TOPOS) + INTEGER(k1) :: FROM, TO + INTENT(IN) :: FROM + INTENT(INOUT) :: TO + INTEGER(k2), INTENT(IN) :: FROMPOS + INTEGER(k3), INTENT(IN) :: LEN + INTEGER(k4), INTENT(IN) :: TOPOS + END SUBROUTINE + END INTERFACE + +Non-elemental intrinsic subroutines +----------------------------------- + +.. code-block:: + + CALL CPU_TIME(REAL INTENT(OUT) TIME) + +The kind of ``TIME`` is not specified in the standard. + +.. code-block:: + + CALL DATE_AND_TIME([DATE, TIME, ZONE, VALUES]) + + +* All arguments are ``OPTIONAL`` and ``INTENT(OUT)``. +* ``DATE``\ , ``TIME``\ , and ``ZONE`` are scalar default ``CHARACTER``. +* ``VALUES`` is a vector of at least 8 elements of ``INTEGER(KIND >= 2)``. + .. code-block:: + + CALL EVENT_QUERY(EVENT, COUNT [, STAT]) + CALL EXECUTE_COMMAND_LINE(COMMAND [, WAIT, EXITSTAT, CMDSTAT, CMDMSG ]) + CALL GET_COMMAND([COMMAND, LENGTH, STATUS, ERRMSG ]) + CALL GET_COMMAND_ARGUMENT(NUMBER [, VALUE, LENGTH, STATUS, ERRMSG ]) + CALL GET_ENVIRONMENT_VARIABLE(NAME [, VALUE, LENGTH, STATUS, TRIM_NAME, ERRMSG ]) + CALL MOVE_ALLOC(ALLOCATABLE INTENT(INOUT) FROM, ALLOCATABLE INTENT(OUT) TO [, STAT, ERRMSG ]) + CALL RANDOM_INIT(LOGICAL(k1) INTENT(IN) REPEATABLE, LOGICAL(k2) INTENT(IN) IMAGE_DISTINCT) + CALL RANDOM_NUMBER(REAL(k) INTENT(OUT) HARVEST(..)) + CALL RANDOM_SEED([SIZE, PUT, GET]) + CALL SYSTEM_CLOCK([COUNT, COUNT_RATE, COUNT_MAX]) + +Atomic intrinsic subroutines +---------------------------- + +.. code-block:: + + CALL ATOMIC_ADD(ATOM, VALUE [, STAT=]) + CALL ATOMIC_AND(ATOM, VALUE [, STAT=]) + CALL ATOMIC_CAS(ATOM, OLD, COMPARE, NEW [, STAT=]) + CALL ATOMIC_DEFINE(ATOM, VALUE [, STAT=]) + CALL ATOMIC_FETCH_ADD(ATOM, VALUE, OLD [, STAT=]) + CALL ATOMIC_FETCH_AND(ATOM, VALUE, OLD [, STAT=]) + CALL ATOMIC_FETCH_OR(ATOM, VALUE, OLD [, STAT=]) + CALL ATOMIC_FETCH_XOR(ATOM, VALUE, OLD [, STAT=]) + CALL ATOMIC_OR(ATOM, VALUE [, STAT=]) + CALL ATOMIC_REF(VALUE, ATOM [, STAT=]) + CALL ATOMIC_XOR(ATOM, VALUE [, STAT=]) + +Collective intrinsic subroutines +-------------------------------- + +.. code-block:: + + CALL CO_BROADCAST + CALL CO_MAX + CALL CO_MIN + CALL CO_REDUCE + CALL CO_SUM + +Non-standard intrinsics +======================= + +PGI +--- + +.. code-block:: + + AND, OR, XOR + LSHIFT, RSHIFT, SHIFT + ZEXT, IZEXT + COSD, SIND, TAND, ACOSD, ASIND, ATAND, ATAN2D + COMPL + DCMPLX + EQV, NEQV + INT8 + JINT, JNINT, KNINT + LOC + +Intel +----- + +.. code-block:: + + DCMPLX(X,Y), QCMPLX(X,Y) + DREAL(DOUBLE COMPLEX A) -> DOUBLE PRECISION + DFLOAT, DREAL + QEXT, QFLOAT, QREAL + DNUM, INUM, JNUM, KNUM, QNUM, RNUM - scan value from string + ZEXT + RAN, RANF + ILEN(I) = BIT_SIZE(I) + SIZEOF + MCLOCK, SECNDS + COTAN(X) = 1.0/TAN(X) + COSD, SIND, TAND, ACOSD, ASIND, ATAND, ATAN2D, COTAND - degrees + AND, OR, XOR + LSHIFT, RSHIFT + IBCHNG, ISHA, ISHC, ISHL, IXOR + IARG, IARGC, NARGS, NUMARG + BADDRESS, IADDR + CACHESIZE, EOF, FP_CLASS, INT_PTR_KIND, ISNAN, LOC + MALLOC + +Intrinsic Procedure Support in f18 +================================== + +This section gives an overview of the support inside f18 libraries for the +intrinsic procedures listed above. +It may be outdated, refer to f18 code base for the actual support status. + +Semantic Analysis +----------------- + +F18 semantic expression analysis phase detects intrinsic procedure references, +validates the argument types and deduces the return types. +This phase currently supports all the intrinsic procedures listed above but the ones in the table below. + +.. list-table:: + :header-rows: 1 + + * - Intrinsic Category + - Intrinsic Procedures Lacking Support + * - Coarray intrinsic functions + - LCOBOUND, UCOBOUND, FAILED_IMAGES, GET_TEAM, IMAGE_INDEX, STOPPED_IMAGES, TEAM_NUMBER, THIS_IMAGE, COSHAPE + * - Object characteristic inquiry functions + - ALLOCATED, ASSOCIATED, EXTENDS_TYPE_OF, IS_CONTIGUOUS, PRESENT, RANK, SAME_TYPE, STORAGE_SIZE + * - Type inquiry intrinsic functions + - BIT_SIZE, DIGITS, EPSILON, HUGE, KIND, MAXEXPONENT, MINEXPONENT, NEW_LINE, PRECISION, RADIX, RANGE, TINY + * - Non-standard intrinsic functions + - AND, OR, XOR, LSHIFT, RSHIFT, SHIFT, ZEXT, IZEXT, COSD, SIND, TAND, ACOSD, ASIND, ATAND, ATAN2D, COMPL, DCMPLX, EQV, NEQV, INT8, JINT, JNINT, KNINT, LOC, QCMPLX, DREAL, DFLOAT, QEXT, QFLOAT, QREAL, DNUM, NUM, JNUM, KNUM, QNUM, RNUM, RAN, RANF, ILEN, SIZEOF, MCLOCK, SECNDS, COTAN, IBCHNG, ISHA, ISHC, ISHL, IXOR, IARG, IARGC, NARGS, NUMARG, BADDRESS, IADDR, CACHESIZE, EOF, FP_CLASS, INT_PTR_KIND, ISNAN, MALLOC + * - Intrinsic subroutines + - MVBITS (elemental), CPU_TIME, DATE_AND_TIME, EVENT_QUERY, EXECUTE_COMMAND_LINE, GET_COMMAND, GET_COMMAND_ARGUMENT, GET_ENVIRONMENT_VARIABLE, MOVE_ALLOC, RANDOM_INIT, RANDOM_NUMBER, RANDOM_SEED, SYSTEM_CLOCK + * - Atomic intrinsic subroutines + - ATOMIC_ADD &al. + * - Collective intrinsic subroutines + - CO_BROADCAST &al. + + +Intrinsic Function Folding +-------------------------- + +Fortran Constant Expressions can contain references to a certain number of +intrinsic functions (see Fortran 2018 standard section 10.1.12 for more details). +Constant Expressions may be used to define kind arguments. Therefore, the semantic +expression analysis phase must be able to fold references to intrinsic functions +listed in section 10.1.12. + +F18 intrinsic function folding is either performed by implementations directly +operating on f18 scalar types or by using host runtime functions and +host hardware types. F18 supports folding elemental intrinsic functions over +arrays when an implementation is provided for the scalars (regardless of whether +it is using host hardware types or not). +The status of intrinsic function folding support is given in the sub-sections below. + +Intrinsic Functions with Host Independent Folding Support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Implementations using f18 scalar types enables folding intrinsic functions +on any host and with any possible type kind supported by f18. The intrinsic functions +listed below are folded using host independent implementations. + +.. list-table:: + :header-rows: 1 + + * - Return Type + - Intrinsic Functions with Host Independent Folding Support + * - INTEGER + - ABS(INTEGER(k)), DIM(INTEGER(k), INTEGER(k)), DSHIFTL, DSHIFTR, IAND, IBCLR, IBSET, IEOR, INT, IOR, ISHFT, KIND, LEN, LEADZ, MASKL, MASKR, MERGE_BITS, POPCNT, POPPAR, SHIFTA, SHIFTL, SHIFTR, TRAILZ + * - REAL + - ABS(REAL(k)), ABS(COMPLEX(k)), AIMAG, AINT, DPROD, REAL + * - COMPLEX + - CMPLX, CONJG + * - LOGICAL + - BGE, BGT, BLE, BLT + + +Intrinsic Functions with Host Dependent Folding Support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Implementations using the host runtime may not be available for all supported +f18 types depending on the host hardware types and the libraries available on the host. +The actual support on a host depends on what the host hardware types are. +The list below gives the functions that are folded using host runtime and the related C/C++ types. +F18 automatically detects if these types match an f18 scalar type. If so, +folding of the intrinsic functions will be possible for the related f18 scalar type, +otherwise an error message will be produced by f18 when attempting to fold related intrinsic functions. + +.. list-table:: + :header-rows: 1 + + * - C/C++ Host Type + - Intrinsic Functions with Host Standard C++ Library Based Folding Support + * - float, double and long double + - ACOS, ACOSH, ASINH, ATAN, ATAN2, ATANH, COS, COSH, ERF, ERFC, EXP, GAMMA, HYPOT, LOG, LOG10, LOG_GAMMA, MOD, SIN, SQRT, SINH, SQRT, TAN, TANH + * - std::complex for float, double and long double + - ACOS, ACOSH, ASIN, ASINH, ATAN, ATANH, COS, COSH, EXP, LOG, SIN, SINH, SQRT, TAN, TANH + + +On top of the default usage of C++ standard library functions for folding described +in the table above, it is possible to compile f18 evaluate library with +`libpgmath `_ +so that it can be used for folding. To do so, one must have a compiled version +of the libpgmath library available on the host and add +``-DLIBPGMATH_DIR=`` to the f18 cmake command. + +Libpgmath comes with real and complex functions that replace C++ standard library +float and double functions to fold all the intrinsic functions listed in the table above. +It has no long double versions. If the host long double matches an f18 scalar type, +C++ standard library functions will still be used for folding expressions with this scalar type. +Libpgmath adds the possibility to fold the following functions for f18 real scalar +types related to host float and double types. + +.. list-table:: + :header-rows: 1 + + * - C/C++ Host Type + - Additional Intrinsic Function Folding Support with Libpgmath (Optional) + * - float and double + - BESSEL_J0, BESSEL_J1, BESSEL_JN (elemental only), BESSEL_Y0, BESSEL_Y1, BESSEL_Yn (elemental only), ERFC_SCALED + + +Libpgmath comes in three variants (precise, relaxed and fast). So far, only the +precise version is used for intrinsic function folding in f18. It guarantees the greatest numerical precision. + +Intrinsic Functions with Missing Folding Support +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The following intrinsic functions are allowed in constant expressions but f18 +is not yet able to fold them. Note that there might be constraints on the arguments +so that these intrinsics can be used in constant expressions (see section 10.1.12 of Fortran 2018 standard). + +ALL, ACHAR, ADJUSTL, ADJUSTR, ANINT, ANY, BESSEL_JN (transformational only), +BESSEL_YN (transformational only), BTEST, CEILING, CHAR, COUNT, CSHIFT, DOT_PRODUCT, +DIM (REAL only), DOT_PRODUCT, EOSHIFT, FINDLOC, FLOOR, FRACTION, HUGE, IACHAR, IALL, +IANY, IPARITY, IBITS, ICHAR, IMAGE_STATUS, INDEX, ISHFTC, IS_IOSTAT_END, +IS_IOSTAT_EOR, LBOUND, LEN_TRIM, LGE, LGT, LLE, LLT, LOGICAL, MATMUL, MAX, MAXLOC, +MAXVAL, MERGE, MIN, MINLOC, MINVAL, MOD (INTEGER only), MODULO, NEAREST, NINT, +NORM2, NOT, OUT_OF_RANGE, PACK, PARITY, PRODUCT, REPEAT, REDUCE, RESHAPE, +RRSPACING, SCAN, SCALE, SELECTED_CHAR_KIND, SELECTED_INT_KIND, SELECTED_REAL_KIND, +SET_EXPONENT, SHAPE, SIGN, SIZE, SPACING, SPREAD, SUM, TINY, TRANSFER, TRANSPOSE, +TRIM, UBOUND, UNPACK, VERIFY. + +Coarray, non standard, IEEE and ISO_C_BINDINGS intrinsic functions that can be +used in constant expressions have currently no folding support at all. diff --git a/flang/docs/LabelResolution.rst b/flang/docs/LabelResolution.rst new file mode 100644 --- /dev/null +++ b/flang/docs/LabelResolution.rst @@ -0,0 +1,378 @@ +Semantics: Resolving Labels and Construct Names +=============================================== + +Overview +-------- + +After the Fortran input file(s) has been parsed into a syntax tree, the compiler must check that the program checks semantically. Target labels must be checked and violations of legal semantics should be reported to the user. + +This is the detailed design document on how these labels will be semantically checked. Legal semantics may result in rewrite operations on the syntax tree. Semantics violations will be reported as errors to the user. + +Requirements +------------ + + +* Input: a parse tree that decomposes the Fortran program unit +* Output: + + * **Success** returns true + (Additionally, the parse tree may be rewritten on success to capture the nested DO loop structure explicitly from any *label-do-stmt* type loops.) + * **Failure** returns false, instantiates (a container of) error message(s) to indicate the problem(s) + +Label generalities (6.2.5) +^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Enforcement of the general label constraints. There are three sorts of label usage. Labels can serve + + +#. as a *label-do-stmt* block range marker +#. as branching (control flow) targets +#. as specification annotations (\ ``FORMAT`` statements) for data transfer statements (I/O constructs) + +Labels are related to the standard definition of inclusive scope. For example, control-flow arcs are not allowed to originate from one inclusive scope and target statements outside of that inclusive scope. + +Inclusive scope is defined as a tree structure of nested scoping constructs. A statement, *s*\ , is said to be *in* the same inclusive scope as another statement, *t*\ , if and only if *s* and *t* are in the same scope or *t* is in one of the enclosing scopes of *s*\ , otherwise *s* is *not in* the same inclusive scope as *t*. (Inclusive scope is unidirectional and is always from innermost scopes to outermost scopes.) + +Semantic Checks +~~~~~~~~~~~~~~~ + + +* labels range from 1 to 99999, inclusive (6.2.5 note 2) + + * handled automatically by the parser, but add a range check + +* labels must be pairwise distinct within their program unit scope (6.2.5 para 2) + + * if redundant labels appear → error redundant labels + * the total number of unique statement labels may have a limit + +Labels Used for ``DO`` Loop Ranging +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +*label-do-stmt* (R1121) +~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A *label-do-stmt* is a control construct that results in the iterative execution of a number of statements. A *label-do-stmt* has a (possibly shared, *nonblock-do-construct*\ ) *label* that will be called the loop target label. The statements to be executed will be the range from the *label-do-stmt* to the statement identified by the loop target label, inclusive. This range of statements will be called the loop's body and logically forms a *do-block*. + +A *label-do-stmt* is quite similar to a *block-do-construct* in semantics, but the parse tree is different in that the parser does not impose a *do-block* structure on the loop body. + +In F18, the nonblock ``DO`` construct has been removed. For legacy support (through F08), we will need to handle nonblock ``DO`` constructs. In F18, the following legacy code is an error. + +.. code-block:: fortran + + DO 100 I = 1, 100 + DO 100 J = 1, 100 + ... + 100 CONTINUE + +Semantic Checks +""""""""""""""" + + +* the loop body target label must exist in the scope (F18:C1133; F08:C815, C817, C819) + + * if the label does not appear, error of missing label + +* the loop body target label must be, lexically, after the *label-do-stmt* (R1119) + + * if the label appears lexically preceding the ``DO``\ , error of malformed ``DO`` + +* control cannot transfer into the body from outside the *do-block* + + * Exceptions (errors demoted to warnings) + + * some implementations relax enforcement of this and allow ``GOTO``\ s from the loop body to "extended ranges" and back again (PGI & gfortan appear to allow, NAG & Intel do not.) + * should some form of "extended ranges" for *do-constructs* be supported, it should still be limited and not include parallel loops such as ``DO CONCURRENT`` or loops annotated with OpenACC or OpenMP directives. + + * ``GOTO``\ s into the ``DO``\ s inclusive scope, error/warn of invalid transfer of control + +* requires that the loop terminating statement for a *label-do-stmt* be either an ``END DO`` or a ``CONTINUE`` + + * Exception + + * earlier standards allowed other statements to be terminators + +Semantics for F08 and earlier that support sharing the loop terminating statement in a *nonblock-do-construct* between multiple loops + + +* some statements cannot be *do-term-action-stmt* (F08:C816) + + * a *do-term-action-stmt* is an *action-stmt* but does not include *arithmetic-if-stmt*\ , *continue-stmt*\ , *cycle-stmt*\ , *end-function-stmt*\ , *end-mp-subprogram-stmt*\ , *end-program-stmt*\ , *end-subroutine-stmt*\ , *error-stop-stmt*\ , *exit-stmt*\ , *goto-stmt*\ , *return-stmt*\ , or *stop-stmt* + + * if the term action statement is forbidden, error invalid statement in ``DO`` loop term position + +* some statements cannot be *do-term-shared-stmt* (F08:C818) + + * this is the case as in our above example where two different nested loops share the same terminating statement (\ ``100 continue``\ ) + * a *do-term-shared-stmt* is an *action-stmt* with all the same exclusions as a *do-term-action-stmt* except a *continue-stmt* **is** allowed + + * if the term shared action statement is forbidden, error invalid statement in term position + +If the ``DO`` loop is a ``DO CONCURRENT`` construct, there are additional constraints (11.1.7.5). + + +* a *return-stmt* is not allowed (C1136) +* image control statements are not allowed (C1137) +* branches must be from a statement and to a statement that both reside within the ``DO CONCURRENT`` (C1138) +* impure procedures shall not be called (C1139) +* deallocation of polymorphic objects is not allowed (C1140) +* references to ``IEEE_GET_FLAG``\ , ``IEEE_SET_HALTING_MODE``\ , and ``IEEE_GET_HALTING_MODE`` cannot appear in the body of a ``DO CONCURRENT`` (C1141) +* the use of the ``ADVANCE=`` specifier by an I/O statement in the body of a ``DO CONCURRENT`` is not allowed (11.1.7.5, para 5) + +Labels Used in Branching +^^^^^^^^^^^^^^^^^^^^^^^^ + +*goto-stmt* (11.2.2, R1157) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +A ``GOTO`` statement is a simple, direct transfer of control from the ``GOTO`` to the labelled statement. + +Semantic Checks +""""""""""""""" + + +* the labelled statement that is the target of a ``GOTO`` (11.2.1 constraints) + + * must refer to a label that is in inclusive scope of the computed ``GOTO`` statement (C1169) + + * if a label does not exist, error nonexistent label + * if a label is out of scope, error out of inclusive scope + + * the branch target statement must be valid + + * if the statement is not allowed as a branch target, error not a valid branch target + +* the labelled statement must be a branch target statement + + * a branch target statement is any of *action-stmt*\ , *associate-stmt*\ , *end-associate-stmt*\ , *if-then-stmt*\ , *end-if-stmt*\ , *select-case-stmt*\ , *end-select-stmt*\ , *select-rank-stmt*\ , *end-select-rank-stmt*\ , *select-type-stmt*\ , *end-select-type-stmt*\ , *do-stmt*\ , *end-do-stmt*\ , *block-stmt*\ , *end-block-stmt*\ , *critical-stmt*\ , *end-critical-stmt*\ , *forall-construct-stmt*\ , *forall-stmt*\ , *where-construct-stmt*\ , *end-function-stmt*\ , *end-mp-subprogram-stmt*\ , *end-program-stmt*\ , or *end-subroutine-stmt*. (11.2.1) + * Some deleted features that were *action-stmt* in older standards include *arithmetic-if-stmt*\ , *assign-stmt*\ , *assigned-goto-stmt*\ , and *pause-stmt*. For legacy mode support, these statements should be considered *action-stmt*. + +*computed-goto-stmt* (11.2.3, R1158) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +The computed ``GOTO`` statement is analogous to a ``switch`` statement in C++. + +.. code-block:: fortran + + GOTO ( label-list ) [,] scalar-int-expr + +Semantics Checks +"""""""""""""""" + + +* each label in *label-list* (11.2.1 constraints, same as ``GOTO``\ ) + + * must refer to a label that is in inclusive scope of the computed ``GOTO`` statement (C1170) + + * if a label does not exist, error nonexistent label + * if a label is out of scope, error out of inclusive scope + + * the branch target statement must be valid + + * if the statement is not allowed as a branch target, error not a valid branch target + +* the *scalar-int-expr* needs to have ``INTEGER`` type + + * check the type of the expression (type checking done elsewhere) + +R853 *arithmetic-if-stmt* (F08:8.2.4) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +This control-flow construct is deleted in F18. + +.. code-block:: fortran + + IF (scalar-numeric-expr) label1,label2,label3 + +The arithmetic if statement is like a three-way branch operator. If the scalar numeric expression is less than zero goto *label-1*\ , else if the variable is equal to zero goto *label-2*\ , else if the variable is greater than zero goto *label-3*. + +Semantics Checks +"""""""""""""""" + + +* the labels in the *arithmetic-if-stmt* triple must all be present in the inclusive scope (F08:C848) + + * if a label does not exist, error nonexistent label + * if a label is out of scope, error out of inclusive scope + +* the *scalar-numeric-expr* must not be ``COMPLEX`` (F08:C849) + + * check the type of the expression (type checking done elsewhere) + +*alt-return-spec* (15.5.1, R1525) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +These are a Fortran control-flow construct for combining a return from a subroutine with a branch to a labelled statement in the calling routine all in one operation. A typical implementation is for the subroutine to return a hidden integer, which is used as a key in the calling code to then, possibly, branch to a labelled statement in inclusive scope. + +The labels are passed by the calling routine. We want to check those labels at the call-site, that is instances of *alt-return-spec*. + +Semantics Checks +"""""""""""""""" + + +* each *alt-return-spec* (11.2.1 constraints, same as ``GOTO``\ ) + + * must refer to a label that is in inclusive scope of the ``CALL`` statement + + * if a label does not exist, error nonexistent label + * if a label is out of scope, error out of inclusive scope + + * the branch target statement must be valid + + * if the statement is not allowed as a branch target, error not a valid branch target + +**END**\ , **EOR**\ , **ERR** specifiers (12.11) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +These specifiers can appear in I/O statements and can transfer control to specific labelled statements under exceptional conditions like end-of-file, end-of-record, and other error conditions. (The PGI compiler adds code to test the results from the runtime routines to determine if these branches should take place.) + +Semantics Checks +"""""""""""""""" + + +* each END, EOR, and ERR specifier (11.2.1 constraints, same as ``GOTO``\ ) + + * must refer to a label that is in inclusive scope of the I/O statement + + * if a label does not exist, error nonexistent label + * if a label is out of scope, error out of inclusive scope + + * the branch target statement must be valid + + * if the statement is not allowed as a branch target, error not a valid branch target + +*assigned-goto-stmt* and *assign-stmt* (F90:8.2.4) +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Deleted feature since Fortran 95. + +The *assigned-goto-stmt* and *assign-stmt* were *action-stmt* in the Fortran 90 standard. They are included here for completeness. This pair of obsolete statements can (will) be enabled as part of the compiler's legacy Fortran support. + +The *assign-stmt* stores a *label* in an integer variable. The *assigned-goto-stmt* will then transfer control to the *label* stored in that integer variable. + +.. code-block:: fortran + + ASSIGN 10 TO i + ... + GOTO i (10,20,30) + +Semantic Checks +""""""""""""""" + + +* an *assigned-goto-stmt* cannot be a *do-term-action-stmt* (F90:R829) +* an *assigned-goto-stmt* cannot be a *do-term-shared-stmt* (F90:R833) +* constraints from (F90:R839) + + * each *label* in an optional *label-list* must be the statement label of a branch target statement that appears in the same scoping unit as the *assigned-goto-stmt* + * *scalar-int-variable* (\ ``i`` in the example above) must be named and of type default integer + * an integer variable that has been assigned a label may only be referenced in an *assigned-goto* or as a format specifier in an I/O statement + * when an I/O statement with a *format-specifier* that is an integer variable is executed or when an *assigned-goto* is executed, the variable must have been assigned a *label* + * an integer variable can only be assigned a label via the ``ASSIGN`` statement + * the label assigned to the variable must be in the same scoping unit as the *assigned-goto* that branches to the *label* value + * if the parameterized list of labels is present, the label value assigned to the integer variable must appear in that *label-list* + * a distinct *label* can appear more than once in the *label-list* + +Some interpretation is needed as the terms of the older standard are different. + +A "scoping unit" is defined as + + +* a derived-type definition +* a procedure interface body, excluding derived-types and interfaces contained within it +* a program unit or subprogram, excluding derived-types, interfaces, and subprograms contained within it + +This is a more lax definition of scope than inclusive scope. + +A *named variable* distinguishes a variable such as, ``i``\ , from an element of an array, ``a(i)``\ , for example. + +Labels used in I/O +^^^^^^^^^^^^^^^^^^ + +Data transfer statements +~~~~~~~~~~~~~~~~~~~~~~~~ + +In data transfer (I/O) statements (e.g., ``READ``\ ), the user can specify a ``FMT=`` specifier that can take a label as its argument. (R1215) + +Semantic Checks +""""""""""""""" + + +* if the ``FMT=`` specifier has a label as its argument (C1230) + + * the label must correspond to a ``FORMAT`` statement + + * if the statement is not a ``FORMAT``\ , error statement must be a ``FORMAT`` + + * the labelled ``FORMAT`` statement must be in the same inclusive scope as the originating data transfer statement (also in 2008) + + * if the label statement does not exist, error label does not exist + * if the label statement is not in scope, error label is not in inclusive scope + + * Exceptions (errors demoted to warnings) + + * PGI extension: referenced ``FORMAT`` statements may appear in a host procedure + * Possible relaxation: the scope of the referenced ``FORMAT`` statement may be ignored, allowing a ``FORMAT`` to be referenced from any scope in the compilation. + +Construct Name generalities +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Various Fortran constructs can have names. These include + + +* the ``WHERE`` construct (10.2.3) +* the ``FORALL`` construct (10.2.4) +* the ``ASSOCIATE`` construct (11.1.3) +* the ``BLOCK`` construct (11.1.4) +* the ``CHANGE TEAM`` construct (11.1.5) +* the ``CRITICAL`` construct (11.1.6) +* the ``DO`` construct (11.1.7) +* the ``IF`` construct (11.1.8) +* the ``SELECT CASE`` construct (11.1.9) +* the ``SELECT RANK`` construct (11.1.10) +* the ``SELECT TYPE`` construct (11.1.11) + +Semantics Checks +~~~~~~~~~~~~~~~~ + +A construct name is a name formed under 6.2.2. A name is an identifier. Identifiers are parsed by the parser. + + +* the maximum length of a name is 63 characters (C601) + +Names must either not be given for the construct or used throughout when specified. + + +* if a construct is given a name, the construct's ``END`` statement must also specify the same name (\ ``WHERE`` C1033, ``FORALL`` C1035, ...) +* ``WHERE`` has additional ``ELSEWHERE`` clauses +* ``IF`` has additional ``ELSE IF`` and ``ELSE`` clauses +* ``SELECT CASE`` has additional ``CASE`` clauses +* ``SELECT RANK`` has additional ``RANK`` clauses +* ``SELECT TYPE`` has additional *type-guard-stmt* + These additional statements must meet the same constraint as the ``END`` of the construct. Names must match, if present, or there must be no names for any of the clauses. + +``CYCLE`` statement (11.1.7.4.4) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``CYCLE`` statement takes an optional *do-construct-name*. + +Semantics Checks +~~~~~~~~~~~~~~~~ + + +* if the ``CYCLE`` has a *construct-name*\ , then the ``CYCLE`` statement must appear within that named *do-construct* (C1134) +* if the ``CYCLE`` does not have a *do-construct-name*\ , the ``CYCLE`` statement must appear within a *do-construct* (C1134) + +``EXIT`` statement (11.1.12) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``EXIT`` statement takes an optional *construct-name*. + +Semantics Checks +~~~~~~~~~~~~~~~~ + + +* if the ``EXIT`` has a *construct-name*\ , then the ``EXIT`` statement must appear within that named construct (C1166) +* if the ``EXIT`` does not have a *construct-name*\ , the ``EXIT`` statement must appear within a *do-construct* (C1166) +* an *exit-stmt* must not appear in a ``DO CONCURRENT`` if the ``EXIT`` belongs to the ``DO CONCURRENT`` or an outer construct enclosing the ``DO CONCURRENT`` (C1167) +* an *exit-stmt* must not appear in a ``CHANGE TEAM`` (\ ``CRITICAL``\ ) if the ``EXIT`` belongs to an outer construct enclosing the ``CHANGE TEAM`` (\ ``CRITICAL``\ ) (C1168) diff --git a/flang/documentation/ModFiles.md b/flang/docs/ModFiles.rst rename from flang/documentation/ModFiles.md rename to flang/docs/ModFiles.rst --- a/flang/documentation/ModFiles.md +++ b/flang/docs/ModFiles.rst @@ -1,29 +1,24 @@ - - -# Module Files +Module Files +============ Module files hold information from a module that is necessary to compile program units that depend on the module. -## Name +Name +---- Module files must be searchable by module name. They are typically named -`.mod`. The advantage of using `.mod` is that it is consistent with +``.mod``. The advantage of using ``.mod`` is that it is consistent with other compilers so users will know what they are. Also, makefiles and scripts -often use `rm *.mod` to clean up. +often use ``rm *.mod`` to clean up. The disadvantage of using the same name as other compilers is that it is not -clear which compiler created a `.mod` file and files from multiple compilers +clear which compiler created a ``.mod`` file and files from multiple compilers cannot be in the same directory. This could be solved by adding something -between the module name and extension, e.g. `-f18.mod`. +between the module name and extension, e.g. ``-f18.mod``. -## Format +Format +------ Module files will be Fortran source. Declarations of all visible entities will be included, along with private @@ -32,7 +27,8 @@ a single *type-declaration-statement*. Executable statements will be omitted. -### Header +Header +^^^^^^ There will be a header containing extra information that cannot be expressed in Fortran. This will take the form of a comment or directive @@ -42,96 +38,113 @@ perform *ad hoc* parsing on it. If it's a directive the compiler could parse it like other directives as part of the grammar. Processing the header before parsing might result in better error messages -when the `.mod` file is invalid. +when the ``.mod`` file is invalid. Regardless of whether the header is a comment or directive we can use the -same string to introduce it: `!mod$`. +same string to introduce it: ``!mod$``. Information in the header: -- Magic string to confirm it is an f18 `.mod` file -- Version information: to indicate the version of the file format, in case it changes, + + +* Magic string to confirm it is an f18 ``.mod`` file +* Version information: to indicate the version of the file format, in case it changes, and the version of the compiler that wrote the file, for diagnostics. -- Checksum of the body of the current file -- Modules we depend on and the checksum of their module file when the current +* Checksum of the body of the current file +* Modules we depend on and the checksum of their module file when the current module file is created -- The source file that produced the `.mod` file? This could be used in error messages. +* The source file that produced the ``.mod`` file? This could be used in error messages. -### Body +Body +^^^^ The body will consist of minimal Fortran source for the required declarations. The order will match the order they first appeared in the source. Some normalization will take place: -- extraneous spaces will be removed -- implicit types will be made explicit -- attributes will be written in a consistent order -- entity declarations will be combined into a single declaration -- function return types specified in a *prefix-spec* will be replaced by + + +* extraneous spaces will be removed +* implicit types will be made explicit +* attributes will be written in a consistent order +* entity declarations will be combined into a single declaration +* function return types specified in a *prefix-spec* will be replaced by an entity declaration -- etc. +* etc. -#### Symbols included +Symbols included +~~~~~~~~~~~~~~~~ All public symbols from the module need to be included. In addition, some private symbols are needed: -- private types that appear in the public API -- private components of non-private derived types -- private parameters used in non-private declarations (initial values, kind parameters) -- others? + + +* private types that appear in the public API +* private components of non-private derived types +* private parameters used in non-private declarations (initial values, kind parameters) +* others? It might be possible to anonymize private names if users don't want them exposed -in the `.mod` file. (Currently they are readable in PGI `.mod` files.) +in the ``.mod`` file. (Currently they are readable in PGI ``.mod`` files.) -#### USE association +USE association +~~~~~~~~~~~~~~~ -A module that contains `USE` statements needs them represented in the -`.mod` file. +A module that contains ``USE`` statements needs them represented in the +``.mod`` file. Each use-associated symbol will be written as a separate *use-only* statement, possibly with renaming. Alternatives: -- Emit a single `USE` for each module, listing all of the symbols that were + + +* Emit a single ``USE`` for each module, listing all of the symbols that were use-associated in the *only-list*. -- Detect when all of the symbols from a module are imported (either by a *use-stmt* +* Detect when all of the symbols from a module are imported (either by a *use-stmt* without an *only-list* or because all of the public symbols of the module - have been listed in *only-list*s). In that case collapse them into a single *use-stmt*. -- Emit the *use-stmt*s that appeared in the original source. + have been listed in *only-list*\ s). In that case collapse them into a single *use-stmt*. +* Emit the *use-stmt*\ s that appeared in the original source. -## Reading and writing module files +Reading and writing module files +-------------------------------- -### Options +Options +^^^^^^^ The compiler will have command-line options to specify where to search for module files and where to write them. By default it will be the current directory for both. -For PGI, `-I` specifies directories to search for include files and module -files. `-module` specifics a directory to write module files in as well as to -search for them. gfortran is similar except it uses `-J` instead of `-module`. +For PGI, ``-I`` specifies directories to search for include files and module +files. ``-module`` specifics a directory to write module files in as well as to +search for them. gfortran is similar except it uses ``-J`` instead of ``-module``. The search order for module files is: -1. The `-module` directory (Note: for gfortran the `-J` directory is not searched). -2. The current directory -3. The `-I` directories in the order they appear on the command line -### Writing module files + +#. The ``-module`` directory (Note: for gfortran the ``-J`` directory is not searched). +#. The current directory +#. The ``-I`` directories in the order they appear on the command line + +Writing module files +^^^^^^^^^^^^^^^^^^^^ When writing a module file, if the existing one matches what would be written, the timestamp is not updated. Module files will be written after semantics, i.e. after the compiler has -determined the module is valid Fortran.
-**NOTE:** PGI does create `.mod` files sometimes even when the module has a +determined the module is valid Fortran.\ :raw-html-m2r:`
` +**NOTE:** PGI does create ``.mod`` files sometimes even when the module has a compilation error. Question: If the compiler can get far enough to determine it is compiling a module -but then encounters an error, should it delete the existing `.mod` file? +but then encounters an error, should it delete the existing ``.mod`` file? PGI does not, gfortran does. -### Reading module files +Reading module files +^^^^^^^^^^^^^^^^^^^^ -When the compiler finds a `.mod` file it needs to read, it firsts checks the first +When the compiler finds a ``.mod`` file it needs to read, it firsts checks the first line and verifies it is a valid module file. It can also verify checksums of modules it depends on and report if they are out of date. @@ -139,12 +152,15 @@ resolution to recreate the symbols from the module. Once the symbol table is populated the parse tree can be discarded. -When processing `.mod` files we know they are valid Fortran with these properties: -1. The input (without the header) is already in the "cooked input" format. -2. No preprocessing is necessary. -3. No errors can occur. +When processing ``.mod`` files we know they are valid Fortran with these properties: + + +#. The input (without the header) is already in the "cooked input" format. +#. No preprocessing is necessary. +#. No errors can occur. -## Error messages referring to modules +Error messages referring to modules +----------------------------------- With this design, diagnostics can refer to names in modules and can emit a normalized declaration of an entity but not point to its location in the diff --git a/flang/documentation/OpenMP-4.5-grammar.txt b/flang/docs/OpenMP-4.5-grammar.txt rename from flang/documentation/OpenMP-4.5-grammar.txt rename to flang/docs/OpenMP-4.5-grammar.txt diff --git a/flang/docs/OpenMP-semantics.rst b/flang/docs/OpenMP-semantics.rst new file mode 100644 --- /dev/null +++ b/flang/docs/OpenMP-semantics.rst @@ -0,0 +1,734 @@ +OpenMP Semantic Analysis +======================== + +OpenMP for F18 +-------------- + + +#. Define and document the parse tree representation for + + * Directives (listed below) + * Clauses (listed below) + * Documentation + +#. All the directives and clauses need source provenance for messages +#. Define and document how an OpenMP directive in the parse tree + will be represented as the parent of the statement(s) + to which the directive applies. + The parser itself will not be able to construct this representation; + there will be subsequent passes that do so + just like for example *do-stmt* and *do-construct*. +#. Define and document the symbol table extensions +#. Define and document the module file extensions + +Directives +^^^^^^^^^^ + +OpenMP divides directives into three categories as follows. +The directives that are in the same categories share some characteristics. + +Declarative directives +~~~~~~~~~~~~~~~~~~~~~~ + +An OpenMP directive may only be placed in a declarative context. +A declarative directive results in one or more declarations only; +it is not associated with the immediate execution of any user code. + +List of existing ones: + + +* declare simd +* declare target +* threadprivate +* declare reduction + +There is a parser node for each of these directives and +the parser node saves information associated with the directive, +for example, +the name of the procedure-name in the ``declare simd`` directive. + +Each parse tree node keeps source provenance, +one for the directive name itself and +one for the entire directive starting from the directive name. + +A top-level class, ``OpenMPDeclarativeConstruct``\ , +holds all four of the node types as discriminated unions +along with the source provenance for the entire directive +starting from ``!$OMP``. + +In ``parser-tree.h``\ , +``OpenMPDeclarativeConstruct`` is part +of the ``SpecificationConstruct`` and ``SpecificationPart`` +in F18 because +a declarative directive can only be placed in the specification part +of a Fortran program. + +All the ``Names`` or ``Designators`` associated +with the declarative directive will be resolved in later phases. + +Executable directives +~~~~~~~~~~~~~~~~~~~~~ + +An OpenMP directive that is **not** declarative. +That is, it may only be placed in an executable context. +It contains stand-alone directives and constructs +that are associated with code blocks. +The stand-alone directive is described in the next section. + +The constructs associated with code blocks listed below +share a similar structure: +*Begin Directive*\ , *Clause List*\ , *Code Block*\ , *End Directive*. +The *End Directive* is optional for constructs +like Loop-associated constructs. + + +* Block-associated constructs (\ ``OpenMPBlockConstruct``\ ) +* Loop-associated constructs (\ ``OpenMPLoopConstruct``\ ) +* Atomic construct (\ ``OpenMPAtomicConstruct``\ ) +* Sections Construct (\ ``OpenMPSectionsConstruct``\ , + contains Sections/Parallel Sections constructs) +* Critical Construct (\ ``OpenMPCriticalConstruct``\ ) + +A top-level class, ``OpenMPConstruct``\ , +includes stand-alone directive and constructs +listed above as discriminated unions. + +In the ``parse-tree.h``\ , ``OpenMPConstruct`` is an element +of the ``ExecutableConstruct``. + +All the ``Names`` or ``Designators`` associated +with the executable directive will be resolved in Semantic Analysis. + +When the backtracking parser can not identify the associated code blocks, +the parse tree will be rewritten later in the Semantics Analysis. + +Stand-alone Directives +~~~~~~~~~~~~~~~~~~~~~~ + +An OpenMP executable directive that has no associated user code +except for that which appears in clauses in the directive. + +List of existing ones: + + +* taskyield +* barrier +* taskwait +* target enter data +* target exit data +* target update +* ordered +* flush +* cancel +* cancellation point + +A higher-level class is created for each category +which contains directives listed above that share a similar structure: + + +* OpenMPSimpleStandaloneConstruct + (taskyield, barrier, taskwait, + target enter/exit data, target update, ordered) +* OpenMPFlushConstruct +* OpenMPCancelConstruct +* OpenMPCancellationPointConstruct + +A top-level class, ``OpenMPStandaloneConstruct``\ , +holds all four of the node types as discriminated unions +along with the source provenance for the entire directive. +Also, each parser node for the stand-alone directive saves +the source provenance for the directive name itself. + +Clauses +^^^^^^^ + +Each clause represented as a distinct class in ``parse-tree.h``. +A top-level class, ``OmpClause``\ , +includes all the clauses as discriminated unions. +The parser node for ``OmpClause`` saves the source provenance +for the entire clause. + +All the ``Names`` or ``Designators`` associated +with the clauses will be resolved in Semantic Analysis. + +Note that the backtracking parser will not validate +that the list of clauses associated +with a directive is valid other than to make sure they are well-formed. +In particular, +the parser does not check that +the association between directive and clauses is correct +nor check that the values in the directives or clauses are correct. +These checks are deferred to later phases of semantics to simplify the parser. + +Symbol Table Extensions for OpenMP +---------------------------------- + +Name resolution can be impacted by the OpenMP code. +In addition to the regular steps to do the name resolution, +new scopes and symbols may need to be created +when encountering certain OpenMP constructs. +This section describes the extensions +for OpenMP during Symbol Table construction. + +OpenMP uses the fork-join model of parallel execution and +all OpenMP threads have access to +a *shared* memory place to store and retrieve variables +but each thread can also have access to +its *threadprivate* memory that must not be accessed by other threads. + +For the directives and clauses that can control the data environments, +compiler needs to determine two kinds of *access* +to variables used in the directive’s associated structured block: +**shared** and **private**. +Each variable referenced in the structured block +has an original variable immediately outside of the OpenMP constructs. +Reference to a shared variable in the structured block +becomes a reference to the original variable. +However, each private variable referenced in the structured block, +a new version of the original variable (of the same type and size) +will be created in the threadprivate memory. + +There are exceptions that directives/clauses +need to create a new ``Symbol`` without creating a new ``Scope``\ , +but in general, +when encountering each of the data environment controlling directives +(discussed in the following sections), +a new ``Scope`` will be created. +For each private variable referenced in the structured block, +a new ``Symbol`` is created out of the original variable +and the new ``Symbol`` is associated +with original variable’s ``Symbol`` via ``HostAssocDetails``. +A new set of OpenMP specific flags are added +into ``Flag`` class in ``symbol.h`` to indicate the types of +associations, +data-sharing attributes, +and data-mapping attributes +in the OpenMP data environments. + +New Symbol without new Scope +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +OpenMP directives that require new ``Symbol`` to be created +but not new ``Scope`` are listed in the following table +in terms of the Symbol Table extensions for OpenMP: + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Directives/Clauses + Create New +

+ Symbol +

+ w/ +

Add Flag +
on Symbol of + Flag +
Declarative Directives + declare simd [(proc-name)] + - + The name of the enclosing function, subroutine, or interface body + to which it applies, or proc-name + OmpDeclareSimd +
declare target + - + The name of the enclosing function, subroutine, or interface body + to which it applies + OmpDeclareTarget +
threadprivate(list) + - + named variables and named common blocks + OmpThreadPrivate +
declare reduction + * + reduction-identifier + OmpDeclareReduction +
Stand-alone directives + flush + - + variable, array section or common block name + OmpFlushed +
critical [(name)] + - + name (user-defined identifier) + OmpCriticalLock +
if ([ directive-name-modifier :] scalar-logical-expr) + - + directive-name-modifier + OmpIfSpecified +
+ + +.. code-block:: + + - No Action + + * Discussed in “Module File Extensions for OpenMP” section + + + +New Symbol with new Scope +^^^^^^^^^^^^^^^^^^^^^^^^^ + +For the following OpenMP regions: + + +* ``target`` regions +* ``teams`` regions +* ``parallel`` regions +* ``simd`` regions +* task generating regions (created by ``task`` or ``taskloop`` constructs) +* worksharing regions + (created by ``do``\ , ``sections``\ , ``single``\ , or ``workshare`` constructs) + +A new ``Scope`` will be created +when encountering the above OpenMP constructs +to ensure the correct data environment during the Code Generation. +To determine whether a variable referenced in these regions +needs the creation of a new ``Symbol``\ , +all the data-sharing attribute rules +described in OpenMP Spec [2.15.1] apply during the Name Resolution. +The available data-sharing attributes are: +**\ *shared*\ **\ , +**\ *private*\ **\ , +**\ *linear*\ **\ , +**\ *firstprivate*\ **\ , +and **\ *lastprivate*\ **. +The attribute is represented as ``Flag`` in the ``Symbol`` object. + +More details are listed in the following table: + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Attribute + Create New Symbol + Add Flag +
on Symbol of + Flag +
shared + No + Original variable + OmpShared +
private + Yes + New Symbol + OmpPrivate +
linear + Yes + New Symbol + OmpLinear +
firstprivate + Yes + New Symbol + OmpFirstPrivate +
lastprivate + Yes + New Symbol + OmpLastPrivate +
+ + +To determine the right data-sharing attribute, +OpenMP defines that the data-sharing attributes +of variables that are referenced in a construct can be +*predetermined*\ , *explicitly determined*\ , or *implicitly determined*. + +Predetermined data-sharing attributes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + + +* Assumed-size arrays are **shared** +* The loop iteration variable(s) + in the associated *do-loop(s)* of a + *do*\ , + *parallel do*\ , + *taskloop*\ , + or *distributeconstruct* + is (are) **private** +* A loop iteration variable + for a sequential loop in a *parallel* or task generating construct + is **private** in the innermost such construct that encloses the loop +* Implied-do indices and *forall* indices are **private** +* The loop iteration variable in the associated *do-loop* + of a *simd* construct with just one associated *do-loop* + is **linear** with a linear-step + that is the increment of the associated *do-loop* +* The loop iteration variables in the associated *do-loop(s)* of a *simd* + construct with multiple associated *do-loop(s)* are **lastprivate** + +Explicitly determined data-sharing attributes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Variables with *explicitly determined* data-sharing attributes are: + + +* Variables are referenced in a given construct +* Variables are listed in a data-sharing attribute clause on the construct. + +The data-sharing attribute clauses are: + + +* *default* clause + (discussed in “Implicitly determined data-sharing attributes”) +* *shared* clause +* *private* clause +* *linear* clause +* *firstprivate* clause +* *lastprivate* clause +* *reduction* clause + (new ``Symbol`` created with the flag ``OmpReduction`` set) + +Note that variables with *predetermined* data-sharing attributes +may not be listed (with exceptions) in data-sharing attribute clauses. + +Implicitly determined data-sharing attributes +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +Variables with implicitly determined data-sharing attributes are: + + +* Variables are referenced in a given construct +* Variables do not have *predetermined* data-sharing attributes +* Variables are not listed in a data-sharing attribute clause + on the construct. + +Rules for variables with *implicitly determined* data-sharing attributes: + + +* In a *parallel* construct, if no *default* clause is present, + these variables are **shared** +* In a task generating construct, + if no *default* clause is present, + a variable for which the data-sharing attribute + is not determined by the rules above + and that in the enclosing context is determined + to be shared by all implicit tasks + bound to the current team is **shared** +* In a *target* construct, + variables that are not mapped after applying data-mapping attribute rules + (discussed later) are **firstprivate** +* In an orphaned task generating construct, + if no *default* clause is present, dummy arguments are **firstprivate** +* In a task generating construct, if no *default* clause is present, + a variable for which the data-sharing attribute is not determined + by the rules above is **firstprivate** +* For constructs other than task generating constructs or *target* constructs, + if no *default* clause is present, + these variables reference the variables with the same names + that exist in the enclosing context +* In a *parallel*\ , *teams*\ , or task generating construct, + the data-sharing attributes of these variables are determined + by the *default* clause, if present: + + * *default(shared)* + clause causes all variables referenced in the construct + that have *implicitly determined* data-sharing attributes + to be **shared** + * *default(private)* + clause causes all variables referenced in the construct + that have *implicitly determined* data-sharing attributes + to be **private** + * *default(firstprivate)* + clause causes all variables referenced in the construct + that have *implicitly determined* data-sharing attributes + to be **firstprivate** + * *default(none)* + clause requires that each variable + that is referenced in the construct, + and that does not have a *predetermined* data-sharing attribute, + must have its data-sharing attribute *explicitly determined* + by being listed in a data-sharing attribute clause + +Data-mapping Attribute +^^^^^^^^^^^^^^^^^^^^^^ + +When encountering the *target data* and *target* directives, +the data-mapping attributes of any variable referenced in a target region +will be determined and represented as ``Flag`` in the ``Symbol`` object +of the variable. +No ``Symbol`` or ``Scope`` will be created. + +The basic steps to determine the data-mapping attribute are: + + +#. If *map* clause is present, + the data-mapping attribute is determined by the *map-type* + on the clause and its corresponding ``Flag`` are listed below: + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ data-mapping attribute + Flag +
to + OmpMapTo +
from + OmpMapFrom +
tofrom + (default if map-type is not present) + OmpMapTo & OmpMapFrom +
alloc + OmpMapAlloc +
release + OmpMapRelease +
delete + OmpMapDelete +
+ + + +#. Otherwise, the following data-mapping rules apply + for variables referenced in a *target* construct + that are *not* declared in the construct and + do not appear in data-sharing attribute or map clauses: + + * If a variable appears in a *to* or *link* clause + on a *declare target* directive then it is treated + as if it had appeared in a *map* clause with a *map-type* of **tofrom** + +#. Otherwise, the following implicit data-mapping attribute rules apply: + + * If a *defaultmap(tofrom:scalar)* clause is *not* present + then a scalar variable is not mapped, + but instead has an implicit data-sharing attribute of **firstprivate** + * If a *defaultmap(tofrom:scalar)* clause is present + then a scalar variable is treated as if it had appeared + in a map clause with a map-type of **tofrom** + * If a variable is not a scalar + then it is treated as if it had appeared in a map clause + with a *map-type* of **tofrom** + +After the completion of the Name Resolution phase, +all the data-sharing or data-mapping attributes marked for the ``Symbols`` +may be used later in the Semantics Analysis and in the Code Generation. + +Module File Extensions for OpenMP +--------------------------------- + +After the successful compilation of modules and submodules +that may contain the following Declarative Directives, +the entire directive starting from ``!$OMP`` needs to be written out +into ``.mod`` files in their corresponding Specification Part: + + +* + *declare simd* or *declare target* + + In the “New Symbol without new Scope” section, + we described that when encountering these two declarative directives, + new ``Flag`` will be applied to the Symbol of the name of + the enclosing function, subroutine, or interface body to + which it applies, or proc-name. + This ``Flag`` should be part of the API information + for the given subroutine or function + +* + *declare reduction* + + The *reduction-identifier* in this directive + can be use-associated or host-associated. + However, it will not act like other Symbols + because user may have a reduction name + that is the same as a Fortran entity name in the same scope. + Therefore a specific data structure needs to be created + to save the *reduction-identifier* information + in the Scope and this directive needs to be written into ``.mod`` files + +Phases of OpenMP Analysis +------------------------- + + +#. Create the parse tree for OpenMP + + #. Add types for directives and clauses + + #. Add type(s) that will be used for directives + #. Add type(s) that will be used for clauses + #. Add other types, e.g. wrappers or other containers + #. Use std::variant to encapsulate meaningful types + + #. Implemented in the parser for OpenMP (openmp-grammar.h) + +#. Create canonical nesting + + #. Restructure parse tree to reflect the association + of directives and stmts + + #. Associate ``OpenMPLoopConstruct`` + with ``DoConstruct`` and ``OpenMPEndLoopDirective`` + + #. Investigate, and perhaps reuse, + the algorithm used to restructure do-loops + #. Add a pass near the code that restructures do-loops; + but do not extend the code that handles do-loop for OpenMP; + keep this code separate. + #. Report errors that prevent restructuring + (e.g. loop directive not followed by loop) + We should abort in case of errors + because there is no point to perform further checks + if it is not a legal OpenMP construct + +#. Validate the structured-block + + #. Structured-block is a block of executable statements + #. Single entry and single exit + #. Access to the structured block must not be the result of a branch + #. The point of exit cannot be a branch out of the structured block + +#. Check that directive and clause combinations are legal + + #. Begin and End directive should match + #. Simply check that the clauses are allowed by the directives + #. Write as a separate pass for simplicity and correctness of the parse tree + +#. Write parse tree tests + + #. At this point, the parse tree should be perfectly formed + #. Write tests that check for correct form and provenance information + #. Write tests for errors that can occur during the restructuring + +#. Scope, symbol tables, and name resolution + + #. Update the existing code to handle names and scopes introduced by OpenMP + #. Write tests to make sure names are properly implemented + +#. Check semantics that is specific to each directive + + #. Validate the directive and its clauses + #. Some clause checks require the result of name resolution, + i.e. “A list item may appear in a *linear* or *firstprivate* clause + but not both.” + #. TBD: + Validate the nested statement for legality in the scope of the directive + #. Check the nesting of regions [OpenMP 4.5 spec 2.17] + +#. Module file utilities + + #. Write necessary OpenMP declarative directives to ``.mod`` files + #. Update the existing code + to read available OpenMP directives from the ``.mod`` files diff --git a/flang/docs/OptionComparison.rst b/flang/docs/OptionComparison.rst new file mode 100644 --- /dev/null +++ b/flang/docs/OptionComparison.rst @@ -0,0 +1,1345 @@ +Compiler options +================ + +This document catalogs the options processed by F18's peers/competitors. Much of the document is taken up by a set of tables that list the options categorized into different topics. Some of the table headings link to more information about the contents of the tables. For example, the table on **Standards conformance** options links to `notes on Standards conformance <#standards>`_. + +**There's also important information in the **\ *\ `Notes section <#notes>`_\ *\ ** near the end of the document on how this data was gathered and what **\ *is*\ ** and **\ *is not*\ ** included in this document.** + +Note that compilers may support language features without having an option for them. Such cases are frequently, but not always noted in this document. + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Standards conformance +
Option Cray GNU IBM Intel PGI Flang
Overall conformance en, +

+ eN +

std=level qlanglvl, qsaa + stand level + Mstandard + Mstandard +
Compatibility with previous standards or implementations + N/A + fdec, +

+ fall-instrinsics +

qxlf77, +

+ qxlf90, +

+ qxlf2003, +

+ qxfl2008, +

+ qport +

f66, +

+ f77rtl, +

+ fpscomp, +

+ Intconstant, +

+ nostandard-realloc-lhs, +

+ standard-semantics, +

+ assume nostd_intent_in, +

+ assume nostd_value, +

+ assume norealloc_lhs +

Mallocatable=95|03 + Mallocatable=95|03 +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Source format +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
Fixed or free source + f free, +

+ f fixed +

ffree-form, +

+ ffixed-form +

qfree, +

+ qfixed +

fixed, +

+ free +

Mfree, +

+ Mfixed +

Mfreeform, +

+ Mfixed +

Source line length + N col + ffixed-line-length-n, +

+ ffree-line-length-n +

qfixed=n + extend-source [size] + Mextend + Mextend +
Column 1 comment specifier + ed + fd-lines-as-code, +

+ fd-lines-as-comments +

D, +

+ qdlines, +

+ qxlines +

d-lines + Mdlines + N/A +
Don't treat CR character as a line terminator + NA + N/A + qnocr + N/A + N/A + N/A +
Source file naming + N/A + N/A + qsuffix + extfor, +

+ Tf filename +

N/A + N/A +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Names, Literals, and other tokens +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
Max identifier length + N/A + fmax-identifier-length=n + N/A + N/A + N/A + N/A +
"$" in symbol names + N/A + fdollar-ok + default + default + N/A + N/A +
Allow names with leading "_" + eQ + N/A + N/A + N/A + N/A + N/A +
Specify name format + N/A + N/A + U + names=keyword + Mupcase + NA +
Escapes in literals + N/A + fbackslash + qescape + assume bscc + Mbackslash + Mbackslash +
Allow multibyte characters in strings + N/A + N/A + qmbcs + N/A + N/A + N/A +
Create null terminated strings + N/A + N/A + qnullterm + N/A + N/A + N/A +
Character to use for "$" + N/A + N/A + N/A + N/A + Mdollar,char + +
Allow PARAMETER statements without parentheses + N/A + N/A + N/A + altparam + N?A + N/A +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
DO loop handling +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
One trip DO loops + ej + N/A + 1, +

+ qonetrip +

f66 + Monetrip + N/A +
Allow branching into loops + eg + N/A + N/A + N/A + N/A + N/A +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
REAL, DOUBLE PRECISION, and COMPLEX Data +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
Default REAL size + s real32, +

+ s real64, +

+ s default32, +

+ s default64 +

fdefault-real-[8|10|16] + qrealsize=[4|8] + real-size [32|64|128] + r[4|8] + r8, +

+ fdefault-real-8 +

Default DOUBLE PRECISION size + ep + fdefault-double-8 + N/A + double-size[64|128] + N/A + N/A +
Make real constants DOUBLE PRECISION + N/A + N/A + qdpc + N/A + N/A + N/A +
Promote or demote REAL type sizes + N/A + freal-[4|8]-real[4|8|10|16] + qautodbl=size + N/A + Mr8, +

+ Mr8intrinsics +

N/A +
Rounding mode + N/A + N/A + qieee + assume std_minus0_rounding + N/A + N/A +
Treatment of -0.0 + N/A + N/A + N/A + assume minus0 + + +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
INTEGER and LOGICAL Data +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
Default INTEGER size + s integer32, +

+ s integer64, +

+ s default32, +

+ s default64 +

fdefault-integer-8 + qintsize=[2|4|8] + integer-size [32|64|128] + I[2|4|8], +

+ Mi4, +

+ Mnoi4 +

i8, +

+ fdefault-integer-8 +

Promote INTEGER sizes + N/A + finteger-4-integer-8 + N/A + N/A + N/A + N/A +
Enable 8 and 16 bit INTEGER and LOGICALS + eh + N/A + N/A + N/A + N/A + N/A +
Change how the compiler treats LOGICAL + N/A + N/A + N/A + N/A + Munixlogical + +
Treatment of numeric constants as arguments + N/A + N/A + qxlf77 oldboz + assume old_boz + N/A + N/A +
Treatment of assignment between numerics and logicals + N/A + N/A + N/A + assume old_logical_assign + N/A + N/A +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
CHARACTER and Pointer Data +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
Use bytes for pointer arithmetic + s byte_pointer + N/A + N/A + N/A + N/A + N/A +
Use words for pointer arithmetic + S word_pointer + N/A + N/A + N/A + N/A + N/A +
Allow character constants for typeless constants + N/A + N/A + qctyplss + N/A + N/A + N/A +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Data types and allocation +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
Default to IMPLICIT NONE + eI + fimplicit-none + u, qundef + warn declarations + Mdclchk + N/A +
Enable DEC STRUCTURE extensions + N/A + fdec-structure + N/A + N/A + default + N/A +
Enable Cray pointers + default + fcray-pointer + Default (near equivalent) + Default (near equivalent) + Mcray + N/A +
Allow bitwise logical operations on numeric + ee + N/A + qintlog + N/A + N/A + N/A +
Allow DEC STATIC and AUTOMATIC declarations + default + fdec-static + Default, see IMPLICIT STATIC and IMPLICIT AUTOMATIC + Default, see AUTOMATIC and STATIC + Default + N/A +
Allocate variables to static storage + ev + fno-automatic + qsave + save, +

+ noauto +

Mnorecursive, +

+ Msave +

N/A +
Compile procedures as if RECURSIVE + eR + frecursive + q recur + assume recursion, +

+ recursive +

Mrecursive + Mrecursive +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Arrays +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
Enable coarrays + h caf + fcoarray=key + N/A + coarray[=keyword] + N/A + N/A +
Contiguous array pointers + h contiguous + N/A + qassert=contig + assume contiguous_pointer + N/A + N/A +
Contiguous assumed shape dummy arguments + h contiguous_assumed_shape + frepack-arrays + qassert=contig + assume contiguous_assumed_shape + N/A + N/A +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
OpenACC, OpenMP, and CUDA +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
Enable OpenACC + h acc + fopenacc + N/A + N/A + acc + N/A +
Enable OpenMP + h omp + fopenmp + qswapomp + qopenmp, +

+ qopenmp-lib, +

+ qopenmp-link, +

+ qopenmp-offload, +

+ qopenmp-simd, +

+ qopenmp-stubs, +

+ qopenmp-threadprivate +

mp, +

+ Mcuda +

-mp +
+ + + +.. raw:: html + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
Miscellaneous +
Option + Cray + GNU + IBM + Intel + PGI + Flang +
Disable compile time range checking + N/A + fno-range-check + N/A + N/A + N/A + N/A +
Disable call site checking + dC + N/A + N/A + N/A + N/A + N/A +
Warn for bad call checking + eb + N/A + N/A + N/A + N/A + N/A +
Set default accessibility of module entities to PRIVATE + N/A + fmodule-private + N/A + N/A + N/A + N/A +
Force FORALL to use temp + N/A + ftest-forall-temp + N/A + N/A + N/A + N/A +
+ + +:raw-html-m2r:``\ Notes +--------------------------------------------- + +**\ :raw-html-m2r:``\ Standards conformance:** + +All conformance options are similar -- they issue warnings if non-standard features are used. All defaults are to allow extensions without warnings. The GNU, IBM, and Intel compilers allow multiple standard levels to be specified. + + +* **Cray**\ : The capital "-eN" option specifies to issue error messages for non-compliance rather than warnings. +* **GNU:** The "std=\ *level*\ " option specifies the standard to which the program is expected to conform. The default value for std is 'gnu', which specifies a superset of the latest Fortran standard that includes all of the extensions supported by GNU Fortran, although warnings will be given for obsolete extensions not recommended for use in new code. The 'legacy' value is equivalent but without the warnings for obsolete extensions. The 'f95', 'f2003', 'f2008', and 'f2018' values specify strict conformance to the respective standards. Errors are given for all extensions beyond the relevant language standard, and warnings are given for the Fortran 77 features that are permitted but obsolescent in later standards. '-std=f2008ts' allows the Fortran 2008 standard including the additions of the Technical Specification (TS) 29113 on Further Interoperability of Fortran with C and TS 18508 on Additional Parallel Features in Fortran. Values for "\ *level*\ " are f\ *95, f2003, f2008, f2008ts, f2018, gnu,* and *legacy.* + +**\ :raw-html-m2r:``\ Source format:** + +**Fixed or free source:** Cray, IBM, and Intel default the source format based on the source file suffix as follows: + + +* **Cray** + + * **Free:** .f90, .F90, .f95, .F95, .f03, .F03, .f08, .F08, .ftn, .FTN + * **Fixed:** .f, .F, .for, .FOR + +* **Intel** + + * **Free:** .f90, .F90, .i90 + * **Fixed:** .f, .for, .FOR, .ftn, .FTN, .fpp, .FPP, .i + +IBM Fortran's options allow the source line length to be specified with the option, e.g., "-qfixed=72". IBM bases the default on the name of the command used to invoke the compiler. IBM has 16 different commands that invoke the Fortran compiler, and the default use of free or fixed format and the line length are based on the command name. -qfixed=72 is the default for the xlf, xlf_r, f77, and fort77 commands. -qfree=f90is the default for the f90, xlf90, xlf90_r, f95, xlf95, xlf95_r, f2003, xlf2003, xlf2003_r, f2008, xlf2008, and xlf2008_r commands. The maximum line length for either source format is 132 characters. + +**Column 1 comment specifier:** All compilers allow "D" in column 1 to specify that the line contains a comment and have this as the default for fixed format source. IBM also supports an "X" in column 1 with the option "-qxlines". + +**Source line length:** + + +* **Cray:** The "-N *col*\ " option specifies the line width for fixed- and free-format source lines. The value used for col specifies the maximum number of columns per line. For free form sources, col can be set to 132, 255, or 1023. For fixed form sources, col can be set to 72, 80, 132, 255, or 1023. Characters in columns beyond the col specification are ignored. By default, lines are 72 characters wide for fixed-format sources and 255 characters wide for free-form sources. +* **GNU:** For both "ffixed-line-length-\ *n*\ " and "ffree-line-length-\ *n*\ " options, characters are ignored after the specified length. The default for fixed is 72. The default for free is 132. For free, you can specify 'none' as the length, which means that all characters in the line are meaningful. +* **IBM:** For **fixed**\ , the default is 72. For **free**\ , there's no default, but the maximum length for either form is 132. +* **Intel:** The default is 72 for **fixed** and 132 for **free**. +* **PGI, Flang:** + + * in free form, it is an error if the line is longer than 1000 characters + * in fixed form by default, characters after column 72 are ignored + * in fixed form with -Mextend, characters after column 132 are ignored + +**\ :raw-html-m2r:``\ Names, Literals, and other tokens** + +**Escapes in literals:** + + +* **GNU:** The "-fbackslash" option the interpretation of backslashes in string literals from a single backslash character to "C-style" escape characters. The following combinations are expanded \a, \b, \f, \n, \r, \t, \v, \, and \0 to the ASCII characters alert, backspace, form feed, newline, carriage return, horizontal tab, vertical tab, backslash, and NUL, respectively. Additionally, \xnn, \unnnn and \Unnnnnnnn (where each n is a hexadecimal digit) are translated into the Unicode characters corresponding to the specified code points. All other combinations of a character preceded by are unexpanded. +* **Intel:** The option "-assume bscc" tells the compiler to treat the backslash character () as a C-style control (escape) character syntax in character literals. "nobscc" specifies that the backslash character is treated as a normal character in character literals. This is the default. + +**"$" in symbol names:** Allowing "$" in names is controlled by an option in GNU and is the default behavior in IBM and Intel. Presumably, these compilers issue warnings when standard conformance options are enabled. Dollar signs in names don't seem to be allowed in Cray, PGI, or Flang. + +**\ :raw-html-m2r:``\ DO loop handling** + +**One trip:** + + +* **IBM:** IBM has two options that do the same thing: "-1" and "-qonetrip". +* **Intel:** Intel used to support a "-onetrip" option, but it has been removed. Intel now supports a "-f66" option that ensures that DO loops are executed at least once in addition to `several other Fortran 66 semantic features `_. + +**\ :raw-html-m2r:``\ REAL, DOUBLE PRECISION, and COMPLEX Data** + +These size options affect the sizes of variables, literals, and intrinsic function results. + +**Default REAL sizes:** These options do not affect the size of explicitly declared data (for example, REAL(KIND=4). + + +* **Cray:** The "-s default32" and "-s default64" options affect both REAL, INTEGER, and LOGICAL types. + +**Default DOUBLE PRECISION:** These options allow control of the size of DOUBLE PRECISION types in conjunction with controlling REAL types. + + +* **Cray:** The "-ep" option controls DOUBLE PRECISION. This option can only be enabled when the default data size is 64 bits ("-s default64" or "-s real64"). When "-s default64" or "-s real64" is specified, and double precision arithmetic is disabled, DOUBLE PRECISION variables and constants specified with the D exponent are converted to default real type (64-bit). If double precision is enabled ("-ep"), they are handled as a double precision type (128-bit). Similarly when the "-s default64" or" -s real64" option is used, variables declared on a DOUBLE COMPLEX statement and complex constants specified with the D exponent are mapped to the complex type in which each part has a default real type, so the complex variable is 128-bit. If double precision is enabled ("-ep"), each part has double precision type, so the double complex variable is 256-bit. +* **GNU:** The "-fdefault-double-8" option sets the DOUBLE PRECISION type to an 8 byte wide type. Do nothing if this is already the default. If "-fdefault-real-8" is given, DOUBLE PRECISION would instead be promoted to 16 bytes if possible, and "-fdefault-double-8" can be used to prevent this. The kind of real constants like 1.d0 will not be changed by "-fdefault-real-8" though, so also "-fdefault-double-8" does not affect it. + +**Promote or demote REAL type sizes:** These options change the meaning of data types specified by declarations of the form REAL(KIND=\ *N*\ ), except, perhaps for PGI. + + +* **GNU:** The allowable combinations are "-freal-4-real-8", "-freal-4-real-10", "-freal-4-real-16", "-freal-8-real-4", "-freal-8-real-10", and "-freal-8-real-16". +* **IBM:** The "-qautodbl" option is documented `here `_. +* **PGI:** The "-Mr8" option promotes REAL variables and constants to DOUBLE PRECISION variables and constants, respectively. DOUBLE PRECISION elements are 8 bytes in length. The "-Mr8intrinsics" option promotes the intrinsics CMPLX and REAL as DCMPLX and DBLE, respectively. + +**\ :raw-html-m2r:``\ INTEGER and LOGICAL Data** + +These size options affect the sizes of variables, literals, and intrinsic function results. + +**Default INTEGER sizes:** For all compilers, these options affect both INTEGER and LOGICAL types. + +**Enable 8 and 16 bit INTEGER and LOGICAL:** This Cray option ("-eh") enables support for 8-bit and 16-bit INTEGER and LOGICAL types that use explicit kind or star values. By default ("-eh"), data objects declared as INTEGER(kind=1) or LOGICAL(kind=1) are 8 bits long, and objects declared as INTEGER(kind=2) or LOGICAL(kind=2) are 16 bits long. When this option is disabled ("-dh"), data objects declared as INTEGER(kind=1), INTEGER(kind=2), LOGICAL(kind=1), or LOGICAL(kind=2) are 32 bits long. + +**Intrinsic functions** + +GNU is the only compiler with options governing the use of non-standard intrinsics. For more information on the GNU options, see `here `_. All compilers implement non-standard intrinsics but don't have options that affect access to them. + +**\ :raw-html-m2r:``\ Arrays** + +**Contiguous array pointers:** All vendors that implement this option (Cray, IBM, and Intel) seem to have apply to all pointer targets. Assuming that the arrays that are targeted by the pointers allows greater optimization. + +**Contiguous assumed shape dummy arguments:** Cray and Intel have a separate argument that's specific to assumed shape dummy arguments. + +**\ :raw-html-m2r:``\ Miscellaneous** + +**Disable call site checking:** This Cray option ("-dC") disables some types of standard call site checking. The current Fortran standard requires that the number and types of arguments must agree between the caller and callee. These constraints are enforced in cases where the compiler can detect them, however, specifying "-dC" disables some of this error checking, which may be necessary in order to get some older Fortran codes to compile. If error checking is disabled, unexpected compile-time or run time results may occur. The compiler by default attempts to detect situations in which an interface block should be specified but is not. Specifying "-dC" disables this type of checking as well. + +**Warn for bad call checking**\ : This Cray option ("-eb") issues a warning message rather than an error message when the compiler detects a call to a procedure with one or more dummy arguments having the TARGET, VOLATILE or ASYNCHRONOUS attribute and there is not an explicit interface definition. + +Notes +----- + +What is and is not included +^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +This document focuses on options relevant to the Fortran language. This includes some features (such as recursion) that are only indirectly related. Options related to the following areas are not included: + + +* Input/Output +* Optimization +* Preprocessing +* Inlining +* Alternate library definition or linking +* Choosing file locations for compiler input or output +* Modules +* Warning and error messages and listing output +* Data initialization +* Run time checks +* Debugging +* Specification of operating system +* Target architecture +* Assembler generation +* Threads or parallelization +* Profiling and code coverage + +Data sources +^^^^^^^^^^^^ + +Here's the list of compilers surveyed, hot linked to the source of data on it. Note that this is the only mention of the Oracle and NAG compilers in this document. + + +* `Cray Fortran Reference Manual version 8.7 `_ +* IBM (XLF) version 14.1 -- `Compiler Referenc `_\ e, `Language Reference `_ +* `Intel Fortran version 19.0 `_ +* `GNU Fortran Compiler version 8.3.0 `_ +* `NAG Fortran Release 6.2 `_ +* `Oracle Fortran version 819-0492-10 `_ +* PGI -- `Compiler Reference version 19.1 `_\ , `Fortran Reference Guide version 17 `_ +* `Flang `_ -- information from GitHub + +This document has been kept relatively small by providing links to much of the information about options rather than duplicating that information. For IBM, Intel, and some PGI options, there are direct links. But direct links were not possible for Cray, GNU and some PGI options. + +Many compilers have options that can either be enabled or disabled. Some compilers indicate this by the presence or absence of the letters "no" in the option name (IBM, Intel, and PGI) while Cray precedes many options with either "e" for enabled or "d" for disabled. This document only includes the enabled version of the option specification. + +Deprecated options were generally ignored, even though they were documented. diff --git a/flang/docs/Overview.rst b/flang/docs/Overview.rst new file mode 100644 --- /dev/null +++ b/flang/docs/Overview.rst @@ -0,0 +1,111 @@ +Overview of Compiler Phases +=========================== + +Each phase produces either correct output or fatal errors. + +Prescan and Preprocess +---------------------- + +See: `Preprocessing.rst `_. + +**Input:** Fortran source and header files, command line macro definitions, + set of enabled compiler directives (to be treated as directives rather than + comments). + +**Output:** + + +* A "cooked" character stream: the entire program as a contiguous stream of + normalized Fortran source. + Extraneous whitespace and comments are removed (except comments that are + compiler directives that are not disabled) and case is normalized. +* Provenance information mapping each character back to the source it came from. + This is used in subsequent phases to issue errors messages that refer to source locations. + +**Entry point:** ``parser::Parsing::Prescan`` + +**Command:** ``f18 -E src.f90`` dumps the cooked character stream + +Parse +----- + +**Input:** Cooked character stream. + +**Output:** A parse tree representing a syntactically correct program, + rooted at a ``parser::Program``. + See: `Parsing.rst `_ and `ParserCombinators.rst `_. + +**Entry point:** ``parser::Parsing::Parse`` + +**Command:** + + +* ``f18 -fdebug-dump-parse-tree -fparse-only src.f90`` dumps the parse tree +* ``f18 -funparse src.f90`` converts the parse tree to normalized Fortran + +Validate Labels and Canonicalize Do Statements +---------------------------------------------- + +**Input:** Parse tree. + +**Output:** The parse tree with label constraints and construct names checked, + and each ``LabelDoStmt`` converted to a ``NonLabelDoStmt``. + See: `LabelResolution.rst `_. + +**Entry points:** ``semantics::ValidateLabels``\ , ``parser::CanonicalizeDo`` + +Resolve Names +------------- + +**Input:** Parse tree (without ``LabelDoStmt``\ ) and ``.mod`` files from compilation + of USEd modules. + +**Output:** + + +* Tree of scopes populated with symbols and types +* Parse tree with some refinements: + + * each ``parser::Name::symbol`` field points to one of the symbols + * each ``parser::TypeSpec::declTypeSpec`` field points to one of the types + * array element references that were parsed as function references or + statement functions are corrected + +**Entry points:** ``semantics::ResolveNames``\ , ``semantics::RewriteParseTree`` + +**Command:** ``f18 -fdebug-dump-symbols -fparse-only src.f90`` dumps the + tree of scopes and symbols in each scope + +Check DO CONCURRENT Constraints +------------------------------- + +**Input:** Parse tree with names resolved. + +**Output:** Parse tree with semantically correct DO CONCURRENT loops. + +Write Module Files +------------------ + +**Input:** Parse tree with names resolved. + +**Output:** For each module and submodule, a ``.mod`` file containing a minimal + Fortran representation suitable for compiling program units that depend on it. + See `ModFiles.rst `_. + +Analyze Expressions and Assignments +----------------------------------- + +**Input:** Parse tree with names resolved. + +**Output:** Parse tree with ``parser::Expr::typedExpr`` filled in and semantic + checks performed on all expressions and assignment statements. + +**Entry points**\ : ``semantics::AnalyzeExpressions``\ , ``semantics::AnalyzeAssignments`` + +Produce the Intermediate Representation +--------------------------------------- + +**Input:** Parse tree with names and labels resolved. + +**Output:** An intermediate representation of the executable program. + See `FortranIR.rst `_. diff --git a/flang/docs/ParserCombinators.rst b/flang/docs/ParserCombinators.rst new file mode 100644 --- /dev/null +++ b/flang/docs/ParserCombinators.rst @@ -0,0 +1,178 @@ +Concept +------- + +The Fortran language recognizer here can be classified as an LL recursive +descent parser. It is composed from a *parser combinator* library that +defines a few fundamental parsers and a few ways to compose them into more +powerful parsers. + +For our purposes here, a *parser* is any object that attempts to recognize +an instance of some syntax from an input stream. It may succeed or fail. +On success, it may return some semantic value to its caller. + +In C++ terms, a parser is any instance of a class that + + +#. has a ``constexpr`` default constructor, +#. defines a type named ``resultType``\ , and +#. provides a function (\ ``const`` member or ``static``\ ) that accepts a reference to a + ``ParseState`` as its argument and returns a ``std::optional`` as a + result, with the presence or absence of a value in the ``std::optional<>`` + signifying success or failure, respectively. + .. code-block:: + + std::optional Parse(ParseState &) const; + The ``resultType`` of a parser is typically the class type of some particular + node type in the parse tree. + +``ParseState`` is a class that encapsulates a position in the source stream, +collects messages, and holds a few state flags that determine tokenization +(e.g., are we in a character literal?). Instances of ``ParseState`` are +independent and complete -- they are cheap to duplicate whenever necessary to +implement backtracking. + +The ``constexpr`` default constructor of a parser is important. The functions +(below) that operate on instances of parsers are themselves all ``constexpr``. +This use of compile-time expressions allows the entirety of a recursive +descent parser for a language to be constructed at compilation time through +the use of templates. + +Fundamental Predefined Parsers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +These objects and functions are (or return) the fundamental parsers: + + +* ``ok`` is a trivial parser that always succeeds without advancing. +* ``pure(x)`` returns a trivial parser that always succeeds without advancing, + returning some value ``x``. +* ``pure()`` is ``pure(T{})`` but does not require that T be copy-constructible. +* ``fail(msg)`` denotes a trivial parser that always fails, emitting the + given message as a side effect. The template parameter is the type of + the value that the parser never returns. +* ``nextCh`` consumes the next character and returns its location, + and fails at EOF. +* ``"xyz"_ch`` succeeds if the next character consumed matches any of those + in the string and returns its location. Be advised that the source + will have been normalized to lower case (miniscule) letters outside + character and Hollerith literals and edit descriptors before parsing. + +Combinators +^^^^^^^^^^^ + +These functions and operators combine existing parsers to generate new parsers. +They are ``constexpr``\ , so they should be viewed as type-safe macros. + + +* ``!p`` succeeds if p fails, and fails if p succeeds. +* ``p >> q`` fails if p does, otherwise running q and returning its value when + it succeeds. +* ``p / q`` fails if p does, otherwise running q and returning p's value + if q succeeds. +* ``p || q`` succeeds if p does, otherwise running q. The two parsers must + have the same type, and the value returned by the first succeeding parser + is the value of the combination. +* ``first(p1, p2, ...)`` returns the value of the first parser that succeeds. + All of the parsers in the list must return the same type. + It is essentially the same as ``p1 || p2 || ...`` but has a slightly + faster implementation and may be easier to format in your code. +* ``lookAhead(p)`` succeeds if p does, but doesn't modify any state. +* ``attempt(p)`` succeeds if p does, safely preserving state on failure. +* ``many(p)`` recognizes a greedy sequence of zero or more nonempty successes + of p, and returns ``std::list<>`` of their values. It always succeeds. +* ``some(p)`` recognized a greedy sequence of one or more successes of p. + It fails if p immediately fails. +* ``skipMany(p)`` is the same as ``many(p)``\ , but it discards the results. +* ``maybe(p)`` tries to match p, returning an ``std::optional`` value. + It always succeeds. +* ``defaulted(p)`` matches p, and when p fails it returns a + default-constructed instance of p's resultType. It always succeeds. +* ``nonemptySeparated(p, q)`` repeatedly matches "p q p q p q ... p", + returning a ``std::list<>`` of only the values of the p's. It fails if + p immediately fails. +* ``extension(p)`` parses p if strict standard compliance is disabled, + or with a warning if nonstandard usage warnings are enabled. +* ``deprecated(p)`` parses p if strict standard compliance is disabled, + with a warning if deprecated usage warnings are enabled. +* ``inContext(msg, p)`` runs p within an error message context; any + message that ``p`` generates will be tagged with ``msg`` as its + context. Contexts may nest. +* ``withMessage(msg, p)`` succeeds if ``p`` does, and if it does not, + it discards the messages from ``p`` and fails with the specified message. +* ``recovery(p, q)`` is equivalent to ``p || q``\ , except that error messages + generated from the first parser are retained, and a flag is set in + the ParseState to remember that error recovery was necessary. +* ``localRecovery(msg, p, q)`` is equivalent to + ``recovery(withMessage(msg, p), q >> pure())`` where ``A`` is the + result type of 'p'. + It is useful for targeted error recovery situations within statements. + +Note that + +.. code-block:: + + a >> b >> c / d / e + +matches a sequence of five parsers, but returns only the result that was +obtained by matching ``c``. + +Applicatives +^^^^^^^^^^^^ + +The following *applicative* combinators combine parsers and modify or +collect the values that they return. + + +* ``construct(p1, p2, ...)`` matches zero or more parsers in succession, + collecting their results and then passing them with move semantics to a + constructor for the type T if they all succeed. + If there is a single parser as the argument and it returns no usable + value but only success or failure (\ *e.g.,* ``"IF"_tok``\ ), the default + nullary constructor of the type ``T`` is called. +* ``sourced(p)`` matches p, and fills in its ``source`` data member with the + locations of the cooked character stream that it consumed +* ``applyFunction(f, p1, p2, ...)`` matches one or more parsers in succession, + collecting their results and passing them as rvalue reference arguments to + some function, returning its result. +* ``applyLambda([](&&x){}, p1, p2, ...)`` is the same thing, but for lambdas + and other function objects. +* ``applyMem(mf, p1, p2, ...)`` is the same thing, but invokes a member + function of the result of the first parser for updates in place. + +Token Parsers +^^^^^^^^^^^^^ + +Last, we have these basic parsers on which the actual grammar of the Fortran +is built. All of the following parsers consume characters acquired from +``nextCh``. + + +* ``space`` always succeeds after consuming any spaces +* ``spaceCheck`` always succeeds after consuming any spaces, and can emit + a warning if there was no space in free form code before a character + that could continue a name or keyword +* ``digit`` matches one cooked decimal digit (0-9) +* ``letter`` matches one cooked letter (A-Z) +* ``"..."_tok`` match the content of the string, skipping spaces before and + after. Internal spaces are optional matches. The ``_tok`` suffix is + optional when the parser appears before the combinator ``>>`` or after + the combinator ``/``. +* ``"..."_sptok`` is a string match in which the spaces are required in + free form source. +* ``"..."_id`` is a string match for a complete identifier (not a prefix of + a longer identifier or keyword). +* ``parenthesized(p)`` is shorthand for ``"(" >> p / ")"``. +* ``bracketed(p)`` is shorthand for ``"[" >> p / "]"``. +* ``nonEmptyList(p)`` matches a comma-separated list of one or more + instances of p. +* ``nonEmptyList(errorMessage, p)`` is equivalent to + ``withMessage(errorMessage, nonemptyList(p))``\ , which allows one to supply + a meaningful error message in the event of an empty list. +* ``optionalList(p)`` is the same thing, but can be empty, and always succeeds. + +Debugging Parser +^^^^^^^^^^^^^^^^ + +Last, a string literal ``"..."_debug`` denotes a parser that emits the string to +``llvm::errs`` and succeeds. It is useful for tracing while debugging a parser but should +obviously not be committed for production code. diff --git a/flang/documentation/Parsing.md b/flang/docs/Parsing.rst rename from flang/documentation/Parsing.md rename to flang/docs/Parsing.rst --- a/flang/documentation/Parsing.md +++ b/flang/docs/Parsing.rst @@ -1,13 +1,6 @@ - - The F18 Parser ============== + This program source code implements a parser for the Fortran programming language. @@ -44,13 +37,16 @@ Prescanning and Preprocessing ----------------------------- + The first pass is performed by an instance of the Prescanner class, with help from an instance of Preprocessor. The prescanner generates the "cooked character stream", implemented by a CookedSource class instance, in which: + + * line ends have been normalized to single ASCII LF characters (UNIX newlines) -* all `INCLUDE` files have been expanded +* all ``INCLUDE`` files have been expanded * all continued Fortran source lines have been unified * all comments and insignificant spaces have been removed * fixed form right margins have been clipped @@ -58,7 +54,7 @@ and Hollerith constants * preprocessing directives have been implemented * preprocessing macro invocations have been expanded -* legacy `D` lines in fixed form source have been omitted or included +* legacy ``D`` lines in fixed form source have been omitted or included * except for the payload in character literals, Hollerith constants, and character and Hollerith edit descriptors, all letters have been normalized to lower case @@ -74,7 +70,7 @@ all of the source-level concerns in the preceding list. The implementation of the preprocessor interacts with the prescanner by -means of _token sequences_. These are partitionings of input lines into +means of *token sequences*. These are partitionings of input lines into contiguous virtual blocks of characters, and are the only place in this Fortran compiler in which we have reified a tokenization of the program source; the parser proper does not have a tokenizer. The prescanner @@ -102,6 +98,7 @@ Source Provenance ----------------- + The prescanner constructs a chronicle of every file that is read by the parser, viz. the original source file and all others that it directly or indirectly includes. One copy of the content of each of these files @@ -120,15 +117,16 @@ Further, every byte in the cooked character stream supplied by the prescanner to the parser can be inexpensively mapped to its provenance. -Simple `const char *` pointers to characters in the cooked character +Simple ``const char *`` pointers to characters in the cooked character stream, or to contiguous ranges thereof, are used as source position indicators within the parser and in the parse tree. Messages -------- + Message texts, and snprintf-like formatting strings for constructing messages, are instantiated in the various components of the parser with -C++ user defined character literals tagged with `_err_en_US` and `_en_US` +C++ user defined character literals tagged with ``_err_en_US`` and ``_en_US`` (signifying fatality and language, with the default being the dialect of English used in the United States) so that they may be easily identified for localization. As described above, messages are associated with @@ -136,6 +134,7 @@ The Parse Tree -------------- + Each of the ca. 450 numbered requirement productions in the standard Fortran language grammar, as well as the productions implied by legacy extensions and preserved obsolescent features, maps to a distinct class @@ -148,14 +147,16 @@ reading that document). Three paradigms collectively implement most of the parse tree classes: -* *wrappers*, in which a single data member `v` has been encapsulated + + +* *wrappers*\ , in which a single data member ``v`` has been encapsulated in a new type * *tuples* (or product types), in which several values of arbitrary - types have been encapsulated in a single data member `t` whose type - is an instance of `std::tuple<>` + types have been encapsulated in a single data member ``t`` whose type + is an instance of ``std::tuple<>`` * *discriminated unions* (or sum types), in which one value whose type is a dynamic selection from a set of distinct types is saved in a data - member `u` whose type is an instance of `std::variant<>` + member ``u`` whose type is an instance of ``std::variant<>`` The use of these patterns is a design convenience, and exceptions to them are not uncommon wherever it made better sense to write custom definitions. @@ -176,6 +177,7 @@ Parsing ------- + This compiler attempts to recognize the entire cooked character stream (see above) as a Fortran program. It records the reductions made during a successful recognition as a parse tree value. The recognized grammar @@ -197,7 +199,7 @@ localized backtracking (specifically, it will not backtrack into a successful reduction to try its other alternatives). It is not generated as a table or code from a specification of the Fortran grammar; rather, it -_is_ the grammar, as declaratively respecified in C++ constant expressions +*is* the grammar, as declaratively respecified in C++ constant expressions using a small collection of basic token recognition objects and a library of "parser combinator" template functions that compose them to form more complicated recognizers and their correspondences to the construction @@ -205,6 +207,7 @@ Unparsing --------- + Parse trees can be converted back into free form Fortran source code. This formatter is not really a classical "pretty printer", but is more of a data structure dump whose output is suitable for compilation diff --git a/flang/docs/Preprocessing.rst b/flang/docs/Preprocessing.rst new file mode 100644 --- /dev/null +++ b/flang/docs/Preprocessing.rst @@ -0,0 +1,227 @@ +Fortran Preprocessing +===================== + +Behavior common to (nearly) all compilers: +------------------------------------------ + + +* Macro and argument names are sensitive to case. +* Fixed form right margin clipping after column 72 (or 132) + has precedence over macro name recognition, and also over + recognition of function-like parentheses and arguments. +* Fixed form right margin clipping does not apply to directive lines. +* Macro names are not recognized as such when spaces are inserted + into their invocations in fixed form. + This includes spaces at the ends of lines that have been clipped + at column 72 (or whatever). +* Text is rescanned after expansion of macros and arguments. +* Macros are not expanded within quoted character literals or + quoted FORMAT edit descriptors. +* Macro expansion occurs before any effective token pasting via fixed form + space removal. +* C-like line continuations with backslash-newline are allowed in + directives, including the definitions of macro bodies. +* ``/* Old style C comments */`` are ignored in directives and + removed from the bodies of macro definitions. +* ``// New style C comments`` are not removed, since Fortran has OPERATOR(//). +* C-like line continuations with backslash-newline can appear in + old-style C comments in directives. +* After ``#define FALSE TRUE``\ , ``.FALSE.`` is replaced by ``.TRUE.``\ ; + i.e., tokenization does not hide the names of operators or logical constants. +* ``#define KWM c`` allows the use of ``KWM`` in column 1 as a fixed form comment + line indicator. +* A ``#define`` directive intermixed with continuation lines can't + define a macro that's invoked earlier in the same continued statement. + +Behavior that is not consistent over all extant compilers but which + +probably should be uncontroversial: +----------------------------------- + + +* Invoked macro names can straddle a Fortran line continuation. +* ... unless implicit fixed form card padding intervenes; i.e., + in fixed form, a continued macro name has to be split at column + 72 (or 132). +* Comment lines may appear with continuations in a split macro names. +* Function-like macro invocations can straddle a Fortran fixed form line + continuation between the name and the left parenthesis, and comment and + directive lines can be there too. +* Function-like macro invocations can straddle a Fortran fixed form line + continuation between the parentheses, and comment lines can be there too. +* Macros are not expanded within Hollerith constants or Hollerith + FORMAT edit descriptors. +* Token pasting with ``##`` works in function-like macros. +* Argument stringization with ``#`` works in function-like macros. +* Directives can be capitalized (e.g., ``#DEFINE``\ ) in fixed form. +* Fixed form clipping after column 72 or 132 is done before macro expansion, + not after. +* C-like line continuation with backslash-newline can appear in the name of + a keyword-like macro definition. +* If ``#`` is in column 6 in fixed form, it's a continuation marker, not a + directive indicator. +* ``#define KWM !`` allows KWM to signal a comment. + +Judgement calls, where precedents are unclear: +---------------------------------------------- + + +* Expressions in ``#if`` and ``#elif`` should support both Fortran and C + operators; e.g., ``#if 2 .LT. 3`` should work. +* If a function-like macro does not close its parentheses, line + continuation should be assumed. +* ... However, the leading parenthesis has to be on the same line as + the name of the function-like macro, or on a continuation line thereof. +* If macros expand to text containing ``&``\ , it doesn't work as a free form + line continuation marker. +* ``#define c 1`` does not allow a ``c`` in column 1 to be used as a label + in fixed form, rather than as a comment line indicator. +* IBM claims to be ISO C compliant and therefore recognizes trigraph sequences. +* Fortran comments in macro actual arguments should be respected, on + the principle that a macro call should work like a function reference. +* If a ``#define`` or ``#undef`` directive appears among continuation + lines, it may or may not affect text in the continued statement that + appeared before the directive. + +Behavior that few compilers properly support (or none), but should: +------------------------------------------------------------------- + + +* A macro invocation can straddle free form continuation lines in all of their + forms, with continuation allowed in the name, before the arguments, and + within the arguments. +* Directives can be capitalized in free form, too. +* ``__VA_ARGS__`` and ``__VA_OPT__`` work in variadic function-like macros. + +In short, a Fortran preprocessor should work as if: +--------------------------------------------------- + + +#. Fixed form lines are padded up to column 72 (or 132) and clipped thereafter. +#. Fortran comments are removed. +#. C-style line continuations are processed in preprocessing directives. +#. C old-style comments are removed from directives. +#. Fortran line continuations are processed (outside preprocessing directives). + Line continuation rules depend on source form. + Comment lines that are enabled compiler directives have their line + continuations processed. + Conditional compilation preprocessing directives (e.g., ``#if``\ ) may be + appear among continuation lines, and have their usual effects upon them. +#. Other preprocessing directives are processed and macros expanded. + Along the way, Fortran ``INCLUDE`` lines and preprocessor ``#include`` directives + are expanded, and all these steps applied recursively to the introduced text. +#. Any Fortran comments created by macro replacement are removed. + +Steps 5 and 6 are interleaved with respect to the preprocessing state. +Conditional compilation preprocessing directives always reflect only the macro +definition state produced by the active ``#define`` and ``#undef`` preprocessing directives +that precede them. + +If the source form is changed by means of a compiler directive (i.e., +``!DIR$ FIXED`` or ``FREE``\ ) in an included source file, its effects cease +at the end of that file. + +Last, if the preprocessor is not integrated into the Fortran compiler, +new Fortran continuation line markers should be introduced into the final +text. + +OpenMP-style directives that look like comments are not addressed by +this scheme but are obvious extensions. + +Appendix +======== + +``N`` in the table below means "not supported"; this doesn't +mean a bug, it just means that a particular behavior was +not observed. +``E`` signifies "error reported". + +The abbreviation ``KWM`` stands for "keyword macro" and ``FLM`` means +"function-like macro". + +The first block of tests (\ ``pp0*.F``\ ) are all fixed-form source files; +the second block (\ ``pp1*.F90``\ ) are free-form source files. + +.. code-block:: + + f18 + | pgfortran + | | ifort + | | | gfortran + | | | | xlf + | | | | | nagfor + | | | | | | + . . . . . . pp001.F keyword macros + . . . . . . pp002.F #undef + . . . . . . pp003.F function-like macros + . . . . . . pp004.F KWMs case-sensitive + . N . N N . pp005.F KWM split across continuation, implicit padding + . N . N N . pp006.F ditto, but with intervening *comment line + N N N N N N pp007.F KWM split across continuation, clipped after column 72 + . . . . . . pp008.F KWM with spaces in name at invocation NOT replaced + . N . N N . pp009.F FLM call split across continuation, implicit padding + . N . N N . pp010.F ditto, but with intervening *comment line + N N N N N N pp011.F FLM call name split across continuation, clipped + . N . N N . pp012.F FLM call name split across continuation + . E . N N . pp013.F FLM call split between name and ( + . N . N N . pp014.F FLM call split between name and (, with intervening *comment + . E . N N . pp015.F FLM call split between name and (, clipped + . E . N N . pp016.F FLM call split between name and ( and in argument + . . . . . . pp017.F KLM rescan + . . . . . . pp018.F KLM rescan with #undef (so rescan is after expansion) + . . . . . . pp019.F FLM rescan + . . . . . . pp020.F FLM expansion of argument + . . . . . . pp021.F KWM NOT expanded in 'literal' + . . . . . . pp022.F KWM NOT expanded in "literal" + . . E E . E pp023.F KWM NOT expanded in 9HHOLLERITH literal + . . . E . . pp024.F KWM NOT expanded in Hollerith in FORMAT + . . . . . . pp025.F KWM expansion is before token pasting due to fixed-form space removal + . . . E . E pp026.F ## token pasting works in FLM + E . . E E . pp027.F #DEFINE works in fixed form + . N . N N . pp028.F fixed-form clipping done before KWM expansion on source line + . . . . . . pp029.F newline allowed in #define + . . . . . . pp030.F /* C comment */ erased from #define + E E E E E E pp031.F // C++ comment NOT erased from #define + . . . . . . pp032.F /* C comment */ newline erased from #define + . . . . . . pp033.F /* C comment newline */ erased from #define + . . . . . N pp034.F newline allowed in name on KWM definition + . E . E E . pp035.F #if 2 .LT. 3 works + . . . . . . pp036.F #define FALSE TRUE ... .FALSE. -> .TRUE. + N N N N N N pp037.F fixed-form clipping NOT applied to #define + . . E . E E pp038.F FLM call with closing ')' on next line (not a continuation) + E . E . E E pp039.F FLM call with '(' on next line (not a continuation) + . . . . . . pp040.F #define KWM c, then KWM works as comment line initiator + E . E . . E pp041.F use KWM expansion as continuation indicators + N N N . . N pp042.F #define c 1, then use c as label in fixed-form + . . . . N . pp043.F #define with # in column 6 is a continuation line in fixed-form + E . . . . . pp044.F #define directive amid continuations + . . . . . . pp101.F90 keyword macros + . . . . . . pp102.F90 #undef + . . . . . . pp103.F90 function-like macros + . . . . . . pp104.F90 KWMs case-sensitive + . N N N N N pp105.F90 KWM call name split across continuation, with leading & + . N N N N N pp106.F90 ditto, with & ! comment + N N E E N . pp107.F90 KWM call name split across continuation, no leading &, with & ! comment + N N E E N . pp108.F90 ditto, but without & ! comment + . N N N N N pp109.F90 FLM call name split with leading & + . N N N N N pp110.F90 ditto, with & ! comment + N N E E N . pp111.F90 FLM call name split across continuation, no leading &, with & ! comment + N N E E N . pp112.F90 ditto, but without & ! comment + . N N N N E pp113.F90 FLM call split across continuation between name and (, leading & + . N N N N E pp114.F90 ditto, with & ! comment, leading & + N N N N N . pp115.F90 ditto, with & ! comment, no leading & + N N N N N . pp116.F90 FLM call split between name and (, no leading & + . . . . . . pp117.F90 KWM rescan + . . . . . . pp118.F90 KWM rescan with #undef, proving rescan after expansion + . . . . . . pp119.F90 FLM rescan + . . . . . . pp120.F90 FLM expansion of argument + . . . . . . pp121.F90 KWM NOT expanded in 'literal' + . . . . . . pp122.F90 KWM NOT expanded in "literal" + . . E E . E pp123.F90 KWM NOT expanded in Hollerith literal + . . E E . E pp124.F90 KWM NOT expanded in Hollerith in FORMAT + E . . E E . pp125.F90 #DEFINE works in free form + . . . . . . pp126.F90 newline works in #define + N . E . E E pp127.F90 FLM call with closing ')' on next line (not a continuation) + E . E . E E pp128.F90 FLM call with '(' on next line (not a continuation) + . . N . . N pp129.F90 #define KWM !, then KWM works as comment line initiator + E . E . . E pp130.F90 #define KWM &, use for continuation w/o pasting (ifort and nag seem to continue #define) diff --git a/flang/docs/PullRequestChecklist.rst b/flang/docs/PullRequestChecklist.rst new file mode 100644 --- /dev/null +++ b/flang/docs/PullRequestChecklist.rst @@ -0,0 +1,47 @@ +Pull request checklist +====================== + +Please review the following items before submitting a pull request. This list +can also be used when reviewing pull requests. + + +* Verify that new files have a license with correct file name. +* Run ``git diff`` on all modified files to look for spurious changes such as + ``#include ``. +* If you added code that causes the compiler to emit a new error message, make + sure that you also added a test that causes that error message to appear + and verifies its correctness. +* Annotate the code and tests with appropriate references to constraint and + requirement numbers from the Fortran standard. Do not include the text of + the constraint or requirement, just its number. +* Alphabetize arbitrary lists of names. +* Check dereferences of pointers and optionals where necessary. +* Ensure that the scopes of all functions and variables are as local as + possible. +* Try to make all functions fit on a screen (40 lines). +* Build and test with both GNU and clang compilers. +* When submitting an update to a pull request, review previous pull request + comments and make sure that you've actually made all of the changes that + were requested. + +Follow the style guide +---------------------- + +The following items are taken from the `C++ style guide `_. But +even though I've read the style guide, they regularly trip me up. + + +* Run clang-format using the git-clang-format script from LLVM HEAD. +* Make sure that all source lines have 80 or fewer characters. Note that + clang-format will do this for most code. But you may need to break up long + strings. +* Review declarations for proper use of ``constexpr`` and ``const``. +* Follow the C++ `naming guidelines `_. +* Ensure that the names evoke their purpose and are consistent with existing code. +* Used braced initializers. +* Review pointer and reference types to make sure that you're using them + appropriately. Note that the `C++ style guide `_ contains a + section that describes all of the pointer types along with their + characteristics. +* Declare non-member functions ``static`` when possible. Prefer + ``static`` functions over functions in anonymous namespaces. diff --git a/flang/documentation/RuntimeDescriptor.md b/flang/docs/RuntimeDescriptor.rst rename from flang/documentation/RuntimeDescriptor.md rename to flang/docs/RuntimeDescriptor.rst --- a/flang/documentation/RuntimeDescriptor.md +++ b/flang/docs/RuntimeDescriptor.rst @@ -1,12 +1,6 @@ - - -## Concept +Concept +------- + The properties that characterize data values and objects in Fortran programs must sometimes be materialized when the program runs. @@ -26,47 +20,56 @@ to serve the needs of compiled code and the run time support library. Previous implementations of Fortran have typically defined a small -sheaf of _descriptor_ data structures for this purpose, and attached +sheaf of *descriptor* data structures for this purpose, and attached these descriptors as additional hidden arguments, type components, and local variables so as to convey dynamic characteristics between subprograms and between user code and the run-time support library. -### References +References +^^^^^^^^^^ + References are to the 12-2017 draft of the Fortran 2018 standard (N2146). Section 15.4.2.2 can be interpreted as a decent list of things that might need descriptors or other hidden state passed across a subprogram call, since such features (apart from assumed-length -`CHARACTER` function results) trigger a requirement for the +``CHARACTER`` function results) trigger a requirement for the subprogram to have an explicit interface visible to their callers. Section 15.5.2 has good laundry lists of situations that can arise across subprogram call boundaries. -## A survey of dynamic characteristics - -### Length of assumed-length `CHARACTER` function results (B.3.6) -``` -CHARACTER*8 :: FOO -PRINT *, FOO('abcdefghijklmnopqrstuvwxyz') -... -CHARACTER*(*) FUNCTION FOO(STR) - CHARACTER*26 STR - FOO=STR -END -``` - -prints `abcdefgh` because the length parameter of the character type -of the result of `FOO` is passed across the call -- even in the absence +A survey of dynamic characteristics +----------------------------------- + +Length of assumed-length ``CHARACTER`` function results (B.3.6) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: + + CHARACTER*8 :: FOO + PRINT *, FOO('abcdefghijklmnopqrstuvwxyz') + ... + CHARACTER*(*) FUNCTION FOO(STR) + CHARACTER*26 STR + FOO=STR + END + +prints ``abcdefgh`` because the length parameter of the character type +of the result of ``FOO`` is passed across the call -- even in the absence of an explicit interface! -### Assumed length type parameters (7.2) -Dummy arguments and associate names for `SELECT TYPE` can have assumed length +Assumed length type parameters (7.2) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Dummy arguments and associate names for ``SELECT TYPE`` can have assumed length type parameters, which are denoted by asterisks (not colons). Their values come from actual arguments or the associated expression (resp.). -### Explicit-shape arrays (8.5.8.2) +Explicit-shape arrays (8.5.8.2) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + The expressions used for lower and upper bounds must be captured and remain invariant over the scope of an array, even if they contain references to variables that are later modified. @@ -75,12 +78,15 @@ and components of derived type (using specification expressions in terms of constants and KIND type parameters). -### Leading dimensions of assumed-size arrays (8.5.8.5) -``` -SUBROUTINE BAR(A) - REAL A(2,3,*) -END -``` +Leading dimensions of assumed-size arrays (8.5.8.5) +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +.. code-block:: + + SUBROUTINE BAR(A) + REAL A(2,3,*) + END + The total size and final dimension's extent do not constitute dynamic properties. The called subprogram has no means to extract the extent of the @@ -94,33 +100,42 @@ those expressions have their values modified later. This is similar to the requirements for an explicit-shape array. -### Some function results -1. Deferred-shape -2. Deferred length type parameter values -3. Stride information for `POINTER` results +Some function results +^^^^^^^^^^^^^^^^^^^^^ + + +#. Deferred-shape +#. Deferred length type parameter values +#. Stride information for ``POINTER`` results -Note that while function result variables can have the `ALLOCATABLE` +Note that while function result variables can have the ``ALLOCATABLE`` attribute, the function itself and the value returned to the caller do not possess the attribute. -### Assumed-shape arrays +Assumed-shape arrays +^^^^^^^^^^^^^^^^^^^^ + The extents of the dimensions of assumed-shape dummy argument arrays are conveyed from those of the actual effective arguments. The bounds, however, are not. The called subprogram can define the lower bound to be a value other than 1, but that is a local effect only. -### Deferred-shape arrays -The extents and bounds of `POINTER` and `ALLOCATABLE` arrays are -established by pointer assignments and `ALLOCATE` statements. -Note that dummy arguments and function results that are `POINTER` -or `ALLOCATABLE` can be deferred-shape, not assumed-shape -- one cannot +Deferred-shape arrays +^^^^^^^^^^^^^^^^^^^^^ + +The extents and bounds of ``POINTER`` and ``ALLOCATABLE`` arrays are +established by pointer assignments and ``ALLOCATE`` statements. +Note that dummy arguments and function results that are ``POINTER`` +or ``ALLOCATABLE`` can be deferred-shape, not assumed-shape -- one cannot supply a lower bound expression as a local effect. -### Strides +Strides +^^^^^^^ + Some arrays can have discontiguous (or negative) strides. These include assumed-shape dummy arguments and deferred-shape -`POINTER` variables, components, and function results. +``POINTER`` variables, components, and function results. Fortran disallows some conceivable cases that might otherwise require implied strides, such as passing an array of an extended @@ -128,28 +143,30 @@ nonpolymorphic dummy array of a base type, or the similar case of pointer assignment to a base of an extended derived type. -Other arrays, including `ALLOCATABLE`, can be assured to +Other arrays, including ``ALLOCATABLE``\ , can be assured to be contiguous, and do not necessarily need to manage or convey dynamic stride information. -`CONTIGUOUS` dummy arguments and `POINTER` arrays need not +``CONTIGUOUS`` dummy arguments and ``POINTER`` arrays need not record stride information either. -(The standard notes that a `CONTIGUOUS POINTER` occupies a +(The standard notes that a ``CONTIGUOUS POINTER`` occupies a number of storage units that is distinct from that required -to hold a non-`CONTIGUOUS` pointer.) +to hold a non-\ ``CONTIGUOUS`` pointer.) -Note that Fortran distinguishes the `CONTIGUOUS` attribute from -the concept of being known or required to be _simply contiguous_ (9.5.4), -which includes `CONTIGUOUS` entities as well as many others, and -the concept of actually _being_ contiguous (8.5.7) during execution. +Note that Fortran distinguishes the ``CONTIGUOUS`` attribute from +the concept of being known or required to be *simply contiguous* (9.5.4), +which includes ``CONTIGUOUS`` entities as well as many others, and +the concept of actually *being* contiguous (8.5.7) during execution. I believe that the property of being simply contiguous implies that an entity is known at compilation time to not require the use or maintenance of hidden stride values. -### Derived type component initializers +Derived type component initializers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Fortran allows components of derived types to be declared with initial values that are to be assigned to the components when an instance of the derived type is created. -These include `ALLOCATABLE` components, which are always initialized +These include ``ALLOCATABLE`` components, which are always initialized to a deallocated state. These can be implemented with constructor subroutines, inline @@ -165,15 +182,19 @@ with assignments to uninitialized derived type instances from static constant initializers. -### Polymorphic `CLASS()`, `CLASS(*)`, and `TYPE(*)` -Type identification for `SELECT TYPE`. +Polymorphic ``CLASS()``\ , ``CLASS(*)``\ , and ``TYPE(*)`` +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Type identification for ``SELECT TYPE``. Default initializers (see above). -Offset locations of `ALLOCATABLE` and polymorphic components. -Presence of `FINAL` procedures. +Offset locations of ``ALLOCATABLE`` and polymorphic components. +Presence of ``FINAL`` procedures. Mappings to overridable type-bound specific procedures. -### Deferred length type parameters -Derived types with length type parameters, and `CHARACTER`, may be used +Deferred length type parameters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +Derived types with length type parameters, and ``CHARACTER``\ , may be used with the values of those parameters deferred to execution. Their actual values must be maintained as characteristics of the dynamic type that is associated with a value or object @@ -181,43 +202,51 @@ A single copy of the deferred length type parameters suffices for all of the elements of an array of that parameterized derived type. -### Components whose types and/or shape depends on length type parameters +Components whose types and/or shape depends on length type parameters +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Non-pointer, non-allocatable components whose types or shapes are expressed in terms of length type parameters will probably have to be implemented as -if they had deferred type and/or shape and were `ALLOCATABLE`. +if they had deferred type and/or shape and were ``ALLOCATABLE``. The derived type instance constructor must allocate them and possibly initialize them; the instance destructor must deallocate them. -### Assumed rank arrays +Assumed rank arrays +^^^^^^^^^^^^^^^^^^^ + Rank is almost always known at compilation time and would be redundant in most circumstances if also managed dynamically. -`DIMENSION(..)` dummy arguments (8.5.8.7), however, are a recent feature +``DIMENSION(..)`` dummy arguments (8.5.8.7), however, are a recent feature with which the rank of a whole array is dynamic outside the cases of -a `SELECT RANK` construct. +a ``SELECT RANK`` construct. The lower bounds of the dimensions of assumed rank arrays are always 1. -### Cached invariant subexpressions for addressing +Cached invariant subexpressions for addressing +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Implementations of Fortran have often maintained precalculated integer values to accelerate subscript computations. -For example, given `REAL*8 :: A(2:4,3:5)`, the data reference `A(I,J)` -resolves to something like `&A + 8*((I-2)+3*(J-3))`, and this can be -effectively reassociated to `&A - 88 + 8*I + 24*J` -or `&A - 88 + 8*(I + 3*J)`. +For example, given ``REAL*8 :: A(2:4,3:5)``\ , the data reference ``A(I,J)`` +resolves to something like ``&A + 8*((I-2)+3*(J-3))``\ , and this can be +effectively reassociated to ``&A - 88 + 8*I + 24*J`` +or ``&A - 88 + 8*(I + 3*J)``. When the offset term and coefficients are not compile-time constants, they are at least invariant and can be precomputed. -In the cases of dummy argument arrays, `POINTER`, and `ALLOCATABLE`, +In the cases of dummy argument arrays, ``POINTER``\ , and ``ALLOCATABLE``\ , these addressing invariants could be managed alongside other dynamic information like deferred extents and lower bounds to avoid their recalculation. It's not clear that it's worth the trouble to do so, since the expressions are invariant and cheap. -### Coarray state (8.5.6) -A _coarray_ is an `ALLOCATABLE` variable or component, or statically -allocated variable (`SAVE` attribute explicit or implied), or dummy +Coarray state (8.5.6) +^^^^^^^^^^^^^^^^^^^^^ + +A *coarray* is an ``ALLOCATABLE`` variable or component, or statically +allocated variable (\ ``SAVE`` attribute explicit or implied), or dummy argument whose ultimate effective argument is one of such things. Each image in a team maintains its portion of each coarray and can @@ -233,14 +262,18 @@ given coarray on a given image, so long as it could be acquired in a "one-sided" fashion.) -### Presence of `OPTIONAL` dummy arguments +Presence of ``OPTIONAL`` dummy arguments +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Typically indicated with null argument addresses. -Note that `POINTER` and `ALLOCATABLE` objects can be passed to -non-`POINTER` non-`ALLOCATABLE` dummy arguments, and their +Note that ``POINTER`` and ``ALLOCATABLE`` objects can be passed to +non-\ ``POINTER`` non-\ ``ALLOCATABLE`` dummy arguments, and their association or allocation status (resp.) determines the presence of the dummy argument. -### Stronger contiguity enforcement or indication +Stronger contiguity enforcement or indication +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + Some implementations of Fortran guarantee that dummy argument arrays are, or have been made to be, contiguous on one or more dimensions when the language does not require them to be so (8.5.7 p2). @@ -257,23 +290,30 @@ whether that difference exactly matches the byte size of the type times the product of the extents of any prior dimensions. -### Host instances for dummy procedures and procedure pointers +Host instances for dummy procedures and procedure pointers +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + A static link or other means of accessing the imported state of the host procedure must be available when an internal procedure is used as an actual argument or as a pointer assignment target. -### Alternate returns +Alternate returns +^^^^^^^^^^^^^^^^^ + Subroutines (only) with alternate return arguments need a means, such as the otherwise unused function return value, by which -to distinguish and identify the use of an alternate `RETURN` statement. +to distinguish and identify the use of an alternate ``RETURN`` statement. The protocol can be a simple nonzero integer that drives a switch in the caller, or the caller can pass multiple return addresses as arguments for the callee to substitute on the stack for the original -return address in the event of an alternate `RETURN`. +return address in the event of an alternate ``RETURN``. + +Implementation options +---------------------- -## Implementation options +A note on array descriptions +^^^^^^^^^^^^^^^^^^^^^^^^^^^^ -### A note on array descriptions Some arrays require dynamic management of distinct combinations of values per dimension. @@ -287,32 +327,37 @@ lower bounds. So there are examples of dimensions with - * extent only (== upper bound): `CONTIGUOUS` assumed-shape, explict shape and multidimensional assumed-size with constant lower bound - * lower bound and either extent or upper bound: `ALLOCATABLE`, `CONTIGUOUS` `POINTER`, general explicit-shape and multidimensional assumed-size - * extent (== upper bound) and stride: general (non-`CONTIGUOUS`) assumed-shape - * lower bound, stride, and either extent or upper bound: general (non-`CONTIGUOUS`) `POINTER`, assumed-rank + + +* extent only (== upper bound): ``CONTIGUOUS`` assumed-shape, explict shape and multidimensional assumed-size with constant lower bound +* lower bound and either extent or upper bound: ``ALLOCATABLE``\ , ``CONTIGUOUS`` ``POINTER``\ , general explicit-shape and multidimensional assumed-size +* extent (== upper bound) and stride: general (non-\ ``CONTIGUOUS``\ ) assumed-shape +* lower bound, stride, and either extent or upper bound: general (non-\ ``CONTIGUOUS``\ ) ``POINTER``\ , assumed-rank and these cases could be accompanied by precomputed invariant addressing subexpressions to accelerate indexing calculations. -### Interoperability requirements +Interoperability requirements +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Fortran 2018 requires that a Fortran implementation supply a header file -`ISO_Fortran_binding.h` for use in C and C++ programs that defines and -implements an interface to Fortran objects from the _interoperable_ +``ISO_Fortran_binding.h`` for use in C and C++ programs that defines and +implements an interface to Fortran objects from the *interoperable* subset of Fortran objects and their types suitable for use when those objects are passed to C functions. This interface mandates a fat descriptor that is passed by address, containing (at least) - * a data base address - * explicit rank and type - * flags to distinguish `POINTER` and `ALLOCATABLE` - * elemental byte size, and - * (per-dimension) lower bound, extent, and byte stride + + +* a data base address +* explicit rank and type +* flags to distinguish ``POINTER`` and ``ALLOCATABLE`` +* elemental byte size, and +* (per-dimension) lower bound, extent, and byte stride The requirements on the interoperability API do not mandate any support for features like derived type component initialization, -automatic deallocation of `ALLOCATABLE` components, finalization, +automatic deallocation of ``ALLOCATABLE`` components, finalization, derived type parameters, data contiguity flags, &c. But neither does the Standard preclude inclusion of additional interfaces to describe and support such things. @@ -320,12 +365,14 @@ Given a desire to fully support the Fortran 2018 language, we need to either support the interoperability requirements as a distinct specialization of the procedure call protocol, or use the -`ISO_Fortran_binding.h` header file requirements as a subset basis for a +``ISO_Fortran_binding.h`` header file requirements as a subset basis for a complete implementation that adds representations for all the missing capabilities, which would be isolated and named so as to prevent user C code from relying upon them. -### Design space +Design space +^^^^^^^^^^^^ + There is a range of possible options for representing the properties of values and objects during the execution of Fortran programs. @@ -359,28 +406,31 @@ or by means of a combination of the two approaches? Consider how to implement the following: -``` -TYPE :: LIST - REAL :: HEAD - TYPE(LIST), ALLOCATABLE :: REST -END TYPE LIST -TYPE(LIST), ALLOCATABLE :: A, B -... -A = B -``` - -Fortran requires that `A`'s arbitrary-length linked list be deleted and -replaced with a "deep copy" of `B`'s. + +.. code-block:: + + TYPE :: LIST + REAL :: HEAD + TYPE(LIST), ALLOCATABLE :: REST + END TYPE LIST + TYPE(LIST), ALLOCATABLE :: A, B + ... + A = B + +Fortran requires that ``A``\ 's arbitrary-length linked list be deleted and +replaced with a "deep copy" of ``B``\ 's. So either a complicated pair of loops must be generated by the compiler, or a sophisticated run time support library needs to be driven with an expressive representation of type information. -## Proposal -We need to write `ISO_Fortran_binding.h` in any event. +Proposal +-------- + +We need to write ``ISO_Fortran_binding.h`` in any event. It is a header that is published for use in user C code for interoperation with compiled Fortran and the Fortran run time support library. -There is a sole descriptor structure defined in `ISO_Fortran_binding.h`. +There is a sole descriptor structure defined in ``ISO_Fortran_binding.h``. It is suitable for characterizing scalars and array sections of intrinsic types. It is essentially a "fat" data pointer that encapsulates a raw data pointer, @@ -392,13 +442,13 @@ could be associated with dynamic base addresses. The F18 runtime cannot use just the mandated interoperable -`struct CFI_cdesc_t` argument descriptor structure as its +``struct CFI_cdesc_t`` argument descriptor structure as its all-purpose data descriptor. It has no information about derived type components, overridable type-bound procedure bindings, type parameters, &c. However, we could extend the standard interoperable argument descriptor. -The `struct CFI_cdesc_t` structure is not of fixed size, but we +The ``struct CFI_cdesc_t`` structure is not of fixed size, but we can efficiently locate the first address after an instance of the standard descriptor and attach our own data record there to hold what we need. @@ -407,12 +457,12 @@ of the addenda. The definitions of our additional run time data structures must -appear in a header file that is distinct from `ISO_Fortran_binding.h`, +appear in a header file that is distinct from ``ISO_Fortran_binding.h``\ , and they should never be used by user applications. This expanded descriptor structure can serve, at least initially for -simplicity, as the sole representation of `POINTER` variables and -components, `ALLOCATABLE` variables and components, and derived type +simplicity, as the sole representation of ``POINTER`` variables and +components, ``ALLOCATABLE`` variables and components, and derived type instances, including length parameter values. An immediate concern with this concept is the amount of space and @@ -420,7 +470,7 @@ needing a descriptor would have to be accompanied by an instance of the general descriptor. (In the linked list example close above, what could be done with a -single pointer for the `REST` component would become at least +single pointer for the ``REST`` component would become at least a four-word dynamic structure.) This concern is amplified when derived type instances are allocated as arrays, since the overhead is per-element. diff --git a/flang/docs/Semantics.rst b/flang/docs/Semantics.rst new file mode 100644 --- /dev/null +++ b/flang/docs/Semantics.rst @@ -0,0 +1,168 @@ +Semantic Analysis +================= + +The semantic analysis pass determines if a syntactically correct Fortran +program is is legal by enforcing the constraints of the language. + +The input is a parse tree with a ``Program`` node at the root; +and a "cooked" character stream, a contiguous stream of characters +containing a normalized form of the Fortran source. + +The semantic analysis pass takes a parse tree for a syntactically +correct Fortran program and determines whether it is legal by enforcing +the constraints of the language. + +If the program is not legal, the results of the semantic pass will be a list of +errors associated with the program. + +If the program is legal, the semantic pass will produce a (possibly modified) +parse tree for the semantically correct program with each name mapped to a symbol +and each expression fully analyzed. + +All user errors are detected either prior to or during semantic analysis. +After it completes successfully the program should compile with no error messages. +There may still be warnings or informational messages. + +Phases of Semantic Analysis +--------------------------- + + +#. `Validate labels <#validate-labels>`_ - + Check all constraints on labels and branches +#. `Rewrite DO loops <#rewrite-do-loops>`_ - + Convert all occurrences of ``LabelDoStmt`` to ``DoConstruct``. +#. `Name resolution <#name-resolution>`_ - + Analyze names and declarations, build a tree of Scopes containing Symbols, + and fill in the ``Name::symbol`` data member in the parse tree +#. `Rewrite parse tree <#rewrite-parse-tree>`_ - + Fix incorrect parses based on symbol information +#. `Expression analysis <#expression-analysis>`_ - + Analyze all expressions in the parse tree and fill in ``Expr::typedExpr`` and + ``Variable::typedExpr`` with analyzed expressions; fix incorrect parses + based on the result of this analysis +#. `Statement semantics <#statement-semantics>`_ - + Perform remaining semantic checks on the execution parts of subprograms +#. `Write module files <#write-module-files>`_ - + If no errors have occurred, write out ``.mod`` files for modules and submodules + +If phase 1 or phase 2 encounter an error on any of the program units, +compilation terminates. Otherwise, phases 3-6 are all performed even if +errors occur. +Module files are written (phase 7) only if there are no errors. + +Validate labels +^^^^^^^^^^^^^^^ + +Perform semantic checks related to labels and branches: + + +* check that any labels that are referenced are defined and in scope +* check branches into loop bodies +* check that labeled ``DO`` loops are properly nested +* check labels in data transfer statements + +Rewrite DO loops +^^^^^^^^^^^^^^^^ + +This phase normalizes the parse tree by removing all unstructured ``DO`` loops +and replacing them with ``DO`` constructs. + +Name resolution +^^^^^^^^^^^^^^^ + +The name resolution phase walks the parse tree and constructs the symbol table. + +The symbol table consists of a tree of ``Scope`` objects rooted at the global scope. +The global scope is owned by the ``SemanticsContext`` object. +It contains a ``Scope`` for each program unit in the compilation. + +Each ``Scope`` in the scope tree contains child scopes representing other scopes +lexically nested in it. +Each ``Scope`` also contains a map of ``CharBlock`` to ``Symbol`` representing names +declared in that scope. (All names in the symbol table are represented as +``CharBlock`` objects, i.e. as substrings of the cooked character stream.) + +All ``Symbol`` objects are owned by the symbol table data structures. +They should be accessed as ``Symbol *`` or ``Symbol &`` outside of the symbol +table classes as they can't be created, copied, or moved. +The ``Symbol`` class has functions and data common across all symbols, and a +``details`` field that contains more information specific to that type of symbol. +Many symbols also have types, represented by ``DeclTypeSpec``. +Types are also owned by scopes. + +Name resolution happens on the parse tree in this order: + + +#. Process the specification of a program unit: + + #. Create a new scope for the unit + #. Create a symbol for each contained subprogram containing just the name + #. Process the opening statement of the unit (\ ``ModuleStmt``\ , ``FunctionStmt``\ , etc.) + #. Process the specification part of the unit + +#. Apply the same process recursively to nested subprograms +#. Process the execution part of the program unit +#. Process the execution parts of nested subprograms recursively + +After the completion of this phase, every ``Name`` corresponds to a ``Symbol`` +unless an error occurred. + +Rewrite parse tree +^^^^^^^^^^^^^^^^^^ + +The parser cannot build a completely correct parse tree without symbol information. +This phase corrects mis-parses based on symbols: + + +* Array element assignments may be parsed as statement functions: ``a(i) = ...`` +* Namelist group names without ``NML=`` may be parsed as format expressions +* A file unit number expression may be parsed as a character variable + +This phase also produces an internal error if it finds a ``Name`` that does not +have its ``symbol`` data member filled in. This error is suppressed if other +errors have occurred because in that case a ``Name`` corresponding to an erroneous +symbol may not be resolved. + +Expression analysis +^^^^^^^^^^^^^^^^^^^ + +Expressions that occur in the specification part are analyzed during name +resolution, for example, initial values, array bounds, type parameters. +Any remaining expressions are analyzed in this phase. + +For each ``Variable`` and top-level ``Expr`` (i.e. one that is not nested below +another ``Expr`` in the parse tree) the analyzed form of the expression is saved +in the ``typedExpr`` data member. After this phase has completed, the analyzed +expression can be accessed using ``semantics::GetExpr()``. + +This phase also corrects mis-parses based on the result of expression analysis: + + +* An expression like ``a(b)`` is parsed as a function reference but may need + to be rewritten to an array element reference (if ``a`` is an object entity) + or to a structure constructor (if ``a`` is a derive type) +* An expression like ``a(b:c)`` is parsed as an array section but may need to be + rewritten as a substring if ``a`` is an object with type CHARACTER + +Statement semantics +^^^^^^^^^^^^^^^^^^^ + +Multiple independent checkers driven by the ``SemanticsVisitor`` framework +perform the remaining semantic checks. +By this phase, all names and expressions that can be successfully resolved +have been. But there may be names without symbols or expressions without +analyzed form if errors occurred earlier. + +Write module files +^^^^^^^^^^^^^^^^^^ + +Separate compilation information is written out on successful compilation +of modules and submodules. These are used as input to name resolution +in program units that ``USE`` the modules. + +Module files are stripped down Fortran source for the module. +Parts that aren't needed to compile dependent program units (e.g. action statements) +are omitted. + +The module file for module ``m`` is named ``m.mod`` and the module file for +submodule ``s`` of module ``m`` is named ``m-s.mod``. diff --git a/flang/documentation/f2018-grammar.txt b/flang/docs/f2018-grammar.txt rename from flang/documentation/f2018-grammar.txt rename to flang/docs/f2018-grammar.txt diff --git a/flang/documentation/flang-c-style.el b/flang/docs/flang-c-style.el rename from flang/documentation/flang-c-style.el rename to flang/docs/flang-c-style.el diff --git a/flang/documentation/BijectiveInternalNameUniquing.md b/flang/documentation/BijectiveInternalNameUniquing.md deleted file mode 100644 --- a/flang/documentation/BijectiveInternalNameUniquing.md +++ /dev/null @@ -1,156 +0,0 @@ -## Bijective Internal Name Uniquing - -FIR has a flat namespace. No two objects may have the same name at -the module level. (These would be functions, globals, etc.) -This necessitates some sort of encoding scheme to unique -symbols from the front-end into FIR. - -Another requirement is -to be able to reverse these unique names and recover the associated -symbol in the symbol table. - -Fortran is case insensitive, which allows the compiler to convert the -user's identifiers to all lower case. Such a universal conversion implies -that all upper case letters are available for use in uniquing. - -### Prefix `_Q` - -All uniqued names have the prefix sequence `_Q` to indicate the name has -been uniqued. (Q is chosen because it is a -[low frequency letter](http://pi.math.cornell.edu/~mec/2003-2004/cryptography/subs/frequencies.html) -in English.) - -### Scope Building - -Symbols can be scoped by the module, submodule, or procedure that contains -that symbol. After the `_Q` sigil, names are constructed from outermost to -innermost scope as - - * Module name prefixed with `M` - * Submodule name prefixed with `S` - * Procedure name prefixed with `F` - -Given: -``` - submodule (mod:s1mod) s2mod - ... - subroutine sub - ... - contains - function fun -``` - -The uniqued name of `fun` becomes: -``` - _QMmodSs1modSs2modFsubPfun -``` - -### Common blocks - - * A common block name will be prefixed with `B` - -Given: -``` - common /variables/ i, j -``` - -The uniqued name of `variables` becomes: -``` - _QBvariables -``` - -Given: -``` - common i, j -``` - -The uniqued name in case of `blank common block` becomes: -``` - _QB -``` - -### Module scope global data - - * A global data entity is prefixed with `E` - * A global entity that is constant (parameter) will be prefixed with `EC` - -Given: -``` - module mod - integer :: intvar - real, parameter :: pi = 3.14 - end module -``` - -The uniqued name of `intvar` becomes: -``` - _QMmodEintvar -``` - -The uniqued name of `pi` becomes: -``` - _QMmodECpi -``` - -### Procedures/Subprograms - - * A procedure/subprogram is prefixed with `P` - -Given: -``` - subroutine sub -``` -The uniqued name of `sub` becomes: -``` - _QPsub -``` - -### Derived types and related - - * A derived type is prefixed with `T` - * If a derived type has KIND parameters, they are listed in a consistent - canonical order where each takes the form `Ki` and where _i_ is the - compile-time constant value. (All type parameters are integer.) If _i_ - is a negative value, the prefix `KN` will be used and _i_ will reflect - the magnitude of the value. - -Given: -``` - module mymodule - type mytype - integer :: member - end type - ... -``` -The uniqued name of `mytype` becomes: -``` - _QMmymoduleTmytype -``` - -Given: -``` - type yourtype(k1,k2) - integer, kind :: k1, k2 - real :: mem1 - complex :: mem2 - end type -``` - -The uniqued name of `yourtype` where `k1=4` and `k2=-6` (at compile-time): -``` - _QTyourtypeK4KN6 -``` - - * A derived type dispatch table is prefixed with `D`. The dispatch table - for `type t` would be `_QDTt` - * A type descriptor instance is prefixed with `C`. Intrinsic types can - be encoded with their names and kinds. The type descriptor for the - type `yourtype` above would be `_QCTyourtypeK4KN6`. The type - descriptor for `REAL(4)` would be `_QCrealK4`. - -### Compiler generated names - -Compiler generated names do not have to be mapped back to Fortran. These -names will be prefixed with `_QQ` and followed by a unique compiler -generated identifier. There is, of course, no mapping back to a symbol -derived from the input source in this case as no such symbol exists. diff --git a/flang/documentation/C++style.md b/flang/documentation/C++style.md deleted file mode 100644 --- a/flang/documentation/C++style.md +++ /dev/null @@ -1,334 +0,0 @@ - - -## In brief: -* Use *clang-format* -from llvm 7 -on all C++ source and header files before -every merge to master. All code layout should be determined -by means of clang-format. -* Where a clear precedent exists in the project, follow it. -* Otherwise, where [LLVM's C++ style guide](https://llvm.org/docs/CodingStandards.html#style-issues) -is clear on usage, follow it. -* Otherwise, where a good public C++ style guide is relevant and clear, - follow it. [Google's](https://google.github.io/styleguide/cppguide.html) - is pretty good and comes with lots of justifications for its rules. -* Reasonable exceptions to these guidelines can be made. -* Be aware of some workarounds for known issues in older C++ compilers that should - still be able to compile f18. They are listed at the end of this document. - -## In particular: - -Use serial commas in comments, error messages, and documentation -unless they introduce ambiguity. - -### Error messages -1. Messages should be a single sentence with few exceptions. -1. Fortran keywords should appear in upper case. -1. Names from the program appear in single quotes. -1. Messages should start with a capital letter. -1. Messages should not end with a period. - -### Files -1. File names should use dashes, not underscores. C++ sources have the -extension ".cpp", not ".C" or ".cc" or ".cxx". Don't create needless -source directory hierarchies. -1. Header files should be idempotent. Use the usual technique: -``` -#ifndef FORTRAN_header_H_ -#define FORTRAN_header_H_ -// code -#endif // FORTRAN_header_H_ -``` -1. `#include` every header defining an entity that your project header or source -file actually uses directly. (Exception: when foo.cpp starts, as it should, -with `#include "foo.h"`, and foo.h includes bar.h in order to define the -interface to the module foo, you don't have to redundantly `#include "bar.h"` -in foo.cpp.) -1. In the source file "foo.cpp", put its corresponding `#include "foo.h"` -first in the sequence of inclusions. -Then `#include` other project headers in alphabetic order; then C++ standard -headers, also alphabetically; then C and system headers. -1. Don't use `#include `. If you need it for temporary debugging, -remove the inclusion before committing. - -### Naming -1. C++ names that correspond to well-known interfaces from the STL, LLVM, -and Fortran standard -can and should look like their models when the reader can safely assume that -they mean the same thing -- e.g., `clear()` and `size()` member functions -in a class that implements an STL-ish container. -Fortran intrinsic function names are conventionally in ALL CAPS. -1. Non-public data members should be named with leading miniscule (lower-case) -letters, internal camelCase capitalization, and a trailing underscore, -e.g. `DoubleEntryBookkeepingSystem myLedger_;`. POD structures with -only public data members shouldn't use trailing underscores, since they -don't have class functions from which data members need to be distinguishable. -1. Accessor member functions are named with the non-public data member's name, -less the trailing underscore. Mutator member functions are named `set_...` -and should return `*this`. Don't define accessors or mutators needlessly. -1. Other class functions should be named with leading capital letters, -CamelCase, and no underscores, and, like all functions, should be based -on imperative verbs, e.g. `HaltAndCatchFire()`. -1. It is fine to use short names for local variables with limited scopes, -especially when you can declare them directly in a `for()`/`while()`/`if()` -condition. Otherwise, prefer complete English words to abbreviations -when creating names. - -### Commentary -1. Use `//` for all comments except for short `/*notes*/` within expressions. -1. When `//` follows code on a line, precede it with two spaces. -1. Comments should matter. Assume that the reader knows current C++ at least as -well as you do and avoid distracting her by calling out usage of new -features in comments. - -### Layout -Always run `clang-format` on your changes before committing code. LLVM -has a `git-clang-format` script to facilitate running clang-format only -on the lines that have changed. - -Here's what you can expect to see `clang-format` do: -1. Indent with two spaces. -1. Don't indent public:, protected:, and private: -accessibility labels. -1. Never use more than 80 characters per source line. -1. Don't use tabs. -1. Don't indent the bodies of namespaces, even when nested. -1. Function result types go on the same line as the function and argument -names. - -Don't try to make columns of variable names or comments -align vertically -- they are maintenance problems. - -Always wrap the bodies of `if()`, `else`, `while()`, `for()`, `do`, &c. -with braces, even when the body is a single statement or empty. The -opening `{` goes on -the end of the line, not on the next line. Functions also put the opening -`{` after the formal arguments or new-style result type, not on the next -line. Use `{}` for empty inline constructors and destructors in classes. - -If any branch of an `if`/`else if`/`else` cascade ends with a return statement, -they all should, with the understanding that the cases are all unexceptional. -When testing for an error case that should cause an early return, do so with -an `if` that doesn't have a following `else`. - -Don't waste space on the screen with needless blank lines or elaborate block -commentary (lines of dashes, boxes of asterisks, &c.). Write code so as to be -easily read and understood with a minimum of scrolling. - -Avoid using assignments in controlling expressions of `if()` &c., even with -the idiom of wrapping them with extra parentheses. - -In multi-element initializer lists (especially `common::visitors{...}`), -including a comma after the last element often causes `clang-format` to do -a better jobs of formatting. - -### C++ language -Use *C++17*, unless some compiler to which we must be portable lacks a feature -you are considering. -However: -1. Never throw or catch exceptions. -1. Never use run-time type information or `dynamic_cast<>`. -1. Never declare static data that executes a constructor. - (This is why `#include ` is contraindicated.) -1. Use `{braced initializers}` in all circumstances where they work, including -default data member initialization. They inhibit implicit truncation. -Don't use `= expr` initialization just to effect implicit truncation; -prefer an explicit `static_cast<>`. -With C++17, braced initializers work fine with `auto` too. -Sometimes, however, there are better alternatives to empty braces; -e.g., prefer `return std::nullopt;` to `return {};` to make it more clear -that the function's result type is a `std::optional<>`. -1. Avoid unsigned types apart from `size_t`, which must be used with care. -When `int` just obviously works, just use `int`. When you need something -bigger than `int`, use `std::int64_t` rather than `long` or `long long`. -1. Use namespaces to avoid conflicts with client code. Use one top-level -`Fortran` project namespace. Don't introduce needless nested namespaces within the -project when names don't conflict or better solutions exist. Never use -`using namespace ...;` outside test code; never use `using namespace std;` -anywhere. Access STL entities with names like `std::unique_ptr<>`, -without a leading `::`. -1. Prefer `static` functions over functions in anonymous namespaces in source files. -1. Use `auto` judiciously. When the type of a local variable is known, -monomorphic, and easy to type, be explicit rather than using `auto`. -Don't use `auto` functions unless the type of the result of an outlined member -function definition can be more clear due to its use of types declared in the -class. -1. Use move semantics and smart pointers to make dynamic memory ownership -clear. Consider reworking any code that uses `malloc()` or a (non-placement) -`operator new`. -See the section on Pointers below for some suggested options. -1. When defining argument types, use values when object semantics are -not required and the value is small and copyable without allocation -(e.g., `int`); -use `const` or rvalue references for larger values (e.g., `std::string`); -use `const` references to rather than pointers to immutable objects; -and use non-`const` references for mutable objects, including "output" arguments -when they can't be function results. -Put such output arguments last (_pace_ the standard C library conventions for `memcpy()` & al.). -1. Prefer `typename` to `class` in template argument declarations. -1. Prefer `enum class` to plain `enum` wherever `enum class` will work. -We have an `ENUM_CLASS` macro that helps capture the names of constants. -1. Use `constexpr` and `const` generously. -1. When a `switch()` statement's labels do not cover all possible case values -explicitly, it should contain either a `default:;` at its end or a -`default:` label that obviously crashes; we have a `CRASH_NO_CASE` macro -for such situations. -1. On the other hand, when a `switch()` statement really does cover all of -the values of an `enum class`, please insert a call to the `SWITCH_COVERS_ALL_CASES` -macro at the top of the block. This macro does the right thing for G++ and -clang to ensure that no warning is emitted when the cases are indeed all covered. -1. When using `std::optional` values, avoid unprotected access to their content. -This is usually by means of `x.has_value()` guarding execution of `*x`. -This is implicit when they are function results assigned to local variables -in `if`/`while` predicates. -When no presence test is obviously protecting a `*x` reference to the -contents, and it is assumed that the contents are present, validate that -assumption by using `x.value()` instead. -1. We use `c_str()` rather than `data()` when converting a `std::string` -to a `const char *` when the result is expected to be NUL-terminated. -1. Avoid explicit comparisions of pointers to `nullptr` and tests of -presence of `optional<>` values with `.has_value()` in the predicate -expressions of control flow statements, but prefer them to implicit -conversions to `bool` when initializing `bool` variables and arguments, -and to the use of the idiom `!!`. - -#### Classes -1. Define POD structures with `struct`. -1. Don't use `this->` in (non-static) member functions, unless forced to -do so in a template member function. -1. Define accessor and mutator member functions (implicitly) inline in the -class, after constructors and assignments. Don't needlessly define -(implicit) inline member functions in classes unless they really solve a -performance problem. -1. Try to make class definitions in headers concise specifications of -interfaces, at least to the extent that C++ allows. -1. When copy constructors and copy assignment are not necessary, -and move constructors/assignment is present, don't declare them and they -will be implicitly deleted. When neither copy nor move constructors -or assignments should exist for a class, explicitly `=delete` all of them. -1. Make single-argument constructors (other than copy and move constructors) -'explicit' unless you really want to define an implicit conversion. - -#### Pointers -There are many -- perhaps too many -- means of indirect addressing -data in this project. -Some of these are standard C++ language and library features, -while others are local inventions in `lib/Common`: -* Bare pointers (`Foo *p`): these are obviously nullable, non-owning, -undefined when uninitialized, shallowly copyable, reassignable, and often -not the right abstraction to use in this project. -But they can be the right choice to represent an optional -non-owning reference, as in a function result. -Use the `DEREF()` macro to convert a pointer to a reference that isn't -already protected by an explicit test for null. -* References (`Foo &r`, `const Foo &r`): non-nullable, not owning, -shallowly copyable, and not reassignable. -References are great for invisible indirection to objects whose lifetimes are -broader than that of the reference. -Take care when initializing a reference with another reference to ensure -that a copy is not made because only one of the references is `const`; -this is a pernicious C++ language pitfall! -* Rvalue references (`Foo &&r`): These are non-nullable references -*with* ownership, and they are ubiquitously used for formal arguments -wherever appropriate. -* `std::reference_wrapper<>`: non-nullable, not owning, shallowly -copyable, and (unlike bare references) reassignable, so suitable for -use in STL containers and for data members in classes that need to be -copyable or assignable. -* `common::Reference<>`: like `std::reference_wrapper<>`, but also supports -move semantics, member access, and comparison for equality; suitable for use in -`std::variant<>`. -* `std::unique_ptr<>`: A nullable pointer with ownership, null by default, -not copyable, reassignable. -F18 has a helpful `Deleter<>` class template that makes `unique_ptr<>` -easier to use with forward-referenced data types. -* `std::shared_ptr<>`: A nullable pointer with shared ownership via reference -counting, null by default, shallowly copyable, reassignable, and slow. -* `Indirection<>`: A non-nullable pointer with ownership and -optional deep copy semantics; reassignable. -Often better than a reference (due to ownership) or `std::unique_ptr<>` -(due to non-nullability and copyability). -Can be wrapped in `std::optional<>` when nullability is required. -Usable with forward-referenced data types with some use of `extern template` -in headers and explicit template instantiation in source files. -* `CountedReference<>`: A nullable pointer with shared ownership via -reference counting, null by default, shallowly copyable, reassignable. -Safe to use *only* when the data are private to just one -thread of execution. -Used sparingly in place of `std::shared_ptr<>` only when the overhead -of that standard feature is prohibitive. - -A feature matrix: - -| indirection | nullable | default null | owning | reassignable | copyable | undefined type ok? | -| ----------- | -------- | ------------ | ------ | ------------ | -------- | ------------------ | -| `*p` | yes | no | no | yes | shallowly | yes | -| `&r` | no | n/a | no | no | shallowly | yes | -| `&&r` | no | n/a | yes | no | shallowly | yes | -| `reference_wrapper<>` | no | n/a | no | yes | shallowly | yes | -| `Reference<>` | no | n/a | no | yes | shallowly | yes | -| `unique_ptr<>` | yes | yes | yes | yes | no | yes, with work | -| `shared_ptr<>` | yes | yes | yes | yes | shallowly | no | -| `Indirection<>` | no | n/a | yes | yes | optionally deeply | yes, with work | -| `CountedReference<>` | yes | yes | yes | yes | shallowly | no | - -### Overall design preferences -Don't use dynamic solutions to solve problems that can be solved at -build time; don't solve build time problems by writing programs that -produce source code when macros and templates suffice; don't write macros -when templates suffice. Templates are statically typed, checked by the -compiler, and are (or should be) visible to debuggers. - -### Exceptions to these guidelines -Reasonable exceptions will be allowed; these guidelines cannot anticipate -all situations. -For example, names that come from other sources might be more clear if -their original spellings are preserved rather than mangled to conform -needlessly to the conventions here, as Google's C++ style guide does -in a way that leads to weirdly capitalized abbreviations in names -like `Http`. -Consistency is one of many aspects in the pursuit of clarity, -but not an end in itself. - -## C++ compiler bug workarounds -Below is a list of workarounds for C++ compiler bugs met with f18 that, even -if the bugs are fixed in latest C++ compiler versions, need to be applied so -that all desired tool-chains can compile f18. - -### Explicitly move noncopyable local variable into optional results - -The following code is legal C++ but fails to compile with the -default Ubuntu 18.04 g++ compiler (7.4.0-1ubuntu1~18.0.4.1): - -``` -class CantBeCopied { - public: - CantBeCopied(const CantBeCopied&) = delete; - CantBeCopied(CantBeCopied&&) = default; - CantBeCopied() {} -}; -std::optional fooNOK() { - CantBeCopied result; - return result; // Legal C++, but does not compile with Ubuntu 18.04 default g++ -} -std::optional fooOK() { - CantBeCopied result; - return {std::move(result)}; // Compiles OK everywhere -} -``` -The underlying bug is actually not specific to `std::optional` but this is the most common -case in f18 where the issue may occur. The actual bug can be reproduced with any class `B` -that has a perfect forwarding constructor taking `CantBeCopied` as argument: -`template B(CantBeCopied&& x) x_{std::forward(x)} {}`. -In such scenarios, Ubuntu 18.04 g++ fails to instantiate the move constructor -and to construct the returned value as it should, instead it complains about a -missing copy constructor. - -Local result variables do not need to and should not be explicitly moved into optionals -if they have a copy constructor. diff --git a/flang/documentation/Directives.md b/flang/documentation/Directives.md deleted file mode 100644 --- a/flang/documentation/Directives.md +++ /dev/null @@ -1,14 +0,0 @@ - - -Compiler directives supported by F18 -==================================== - -* `!dir$ fixed` and `!dir$ free` select Fortran source forms. Their effect - persists to the end of the current source file. -* `!dir$ ignore_tkr (tkr) var-list` omits checks on type, kind, and/or rank. diff --git a/flang/documentation/IORuntimeInternals.md b/flang/documentation/IORuntimeInternals.md deleted file mode 100644 --- a/flang/documentation/IORuntimeInternals.md +++ /dev/null @@ -1,342 +0,0 @@ - - -Fortran I/O Runtime Library Internal Design -=========================================== - -This note is meant to be an overview of the design of the *implementation* -of the f18 Fortran compiler's runtime support library for I/O statements. - -The *interface* to the I/O runtime support library is defined in the -C++ header file `runtime/io-api.h`. -This interface was designed to minimize the amount of complexity exposed -to its clients, which are of course the sequences of calls generated by -the compiler to implement each I/O statement. -By keeping this interface as simple as possible, we hope that we have -lowered the risk of future incompatible changes that would necessitate -recompilation of Fortran codes in order to link with later versions of -the runtime library. -As one will see in `io-api.h`, the interface is also directly callable -from C and C++ programs. - -The I/O facilities of the Fortran 2018 language are specified in the -language standard in its clauses 12 (I/O statements) and 13 (`FORMAT`). -It's a complicated collection of language features: - * Files can comprise *records* or *streams*. - * Records can be fixed-length or variable-length. - * Record files can be accessed sequentially or directly (random access). - * Files can be *formatted*, or *unformatted* raw bits. - * `CHARACTER` scalars and arrays can be used as if they were -fixed-length formatted sequential record files. - * Formatted I/O can be under control of a `FORMAT` statement -or `FMT=` specifier, *list-directed* with default formatting chosen -by the runtime, or `NAMELIST`, in which a collection of variables -can be given a name and passed as a group to the runtime library. - * Sequential records of a file can be partially processed by one -or more *non-advancing* I/O statements and eventually completed by -another. - * `FORMAT` strings can manipulate the position in the current -record arbitrarily, causing re-reading or overwriting. - * Floating-point output formatting supports more rounding modes -than the IEEE standard for floating-point arithmetic. - -The Fortran I/O runtime support library is written in C++17, and -uses some C++17 standard library facilities, but it is intended -to not have any link-time dependences on the C++ runtime support -library or any LLVM libraries. -This is important because there are at least two C++ runtime support -libraries, and we don't want Fortran application builders to have to -build multiple versions of their codes; neither do we want to require -them to ship LLVM libraries along with their products. - -Consequently, dynamic memory allocation in the Fortran runtime -uses only C's `malloc()` and `free()` functions, and the few -C++ standard class templates that we instantiate in the library have been -modified with optional template arguments that override their -allocators and deallocators. - -Conversions between the many binary floating-point formats supported -by f18 and their decimal representations are performed with the same -template library of fast conversion algorithms used to interpret -floating-point values in Fortran source programs and to emit them -to module files. - -Overview of Classes -=================== - -A suite of C++ classes and class templates are composed to construct -the Fortran I/O runtime support library. -They (mostly) reside in the C++ namespace `Fortran::runtime::io`. -They are summarized here in a bottom-up order of dependence. - -The header and C++ implementation source file names of these -classes are in the process of being vigorously rearranged and -modified; use `grep` or an IDE to discover these classes in -the source for now. (Sorry!) - -`Terminator` ----------- -A general facility for the entire library, `Terminator` latches a -source program statement location in terms of an unowned pointer to -its source file path name and line number and uses them to construct -a fatal error message if needed. -It is used for both user program errors and internal runtime library crashes. - -`IoErrorHandler` --------------- -When I/O error conditions arise at runtime that the Fortran program -might have the privilege to handle itself via `ERR=`, `END=`, or -`EOR=` labels and/or by an `IOSTAT=` variable, this subclass of -`Terminator` is used to either latch the error indication or to crash. -It sorts out priorities in the case of multiple errors and determines -the final `IOSTAT=` value at the end of an I/O statement. - -`MutableModes` ------------- -Fortran's formatted I/O statements are affected by a suite of -modes that can be configured by `OPEN` statements, overridden by -data transfer I/O statement control lists, and further overridden -between data items with control edit descriptors in a `FORMAT` string. -These modes are represented with a `MutableModes` instance, and these -are instantiated and copied where one would expect them to be in -order to properly isolate their modifications. -The modes in force at the time each data item is processed constitute -a member of each `DataEdit`. - -`DataEdit` --------- -Represents a single data edit descriptor from a `FORMAT` statement -or `FMT=` character value, with some hidden extensions to also -support formatting of list-directed transfers. -It holds an instance of `MutableModes`, and also has a repetition -count for when an array appears as a data item in the *io-list*. -For simplicity and efficiency, each data edit descriptor is -encoded in the `DataEdit` as a simple capitalized character -(or two) and some optional field widths. - -`FormatControl<>` ---------------- -This class template traverses a `FORMAT` statement's contents (or `FMT=` -character value) to extract data edit descriptors like `E20.14` to -serve each item in an I/O data transfer statement's *io-list*, -making callbacks to an instance of its class template argument -along the way to effect character literal output and record -positioning. -The Fortran language standard defines formatted I/O as if the `FORMAT` -string were driving the traversal of the data items in the *io-list*, -but our implementation reverses that perspective to allow a more -convenient (for the compiler) I/O runtime support library API design -in which each data item is presented to the library with a distinct -type-dependent call. - -Clients of `FormatControl` instantiations call its `GetNextDataEdit()` -member function to acquire the next data edit descriptor to be processed -from the format, and `FinishOutput()` to flush out any remaining -output strings or record positionings at the end of the *io-list*. - -The `DefaultFormatControlCallbacks` structure summarizes the API -expected by `FormatControl` from its class template actual arguments. - -`OpenFile` --------- -This class encapsulates all (I hope) the operating system interfaces -used to interact with the host's filesystems for operations on -external units. -Asynchronous I/O interfaces are faked for now with synchronous -operations and deferred results. - -`ConnectionState` ---------------- -An active connection to an external or internal unit maintains -the common parts of its state in this subclass of `ConnectionAttributes`. -The base class holds state that should not change during the -lifetime of the connection, while the subclass maintains state -that may change during I/O statement execution. - -`InternalDescriptorUnit` ----------------------- -When I/O is being performed from/to a Fortran `CHARACTER` array -rather than an external file, this class manages the standard -interoperable descriptor used to access its elements as records. -It has the necessary interfaces to serve as an actual argument -to the `FormatControl` class template. - -`FileFrame<>` ------------ -This CRTP class template isolates all of the complexity involved between -an external unit's `OpenFile` and the buffering requirements -imposed by the capabilities of Fortran `FORMAT` control edit -descriptors that allow repositioning within the current record. -Its interface enables its clients to define a "frame" (my term, -not Fortran's) that is a contiguous range of bytes that are -or may soon be in the file. -This frame is defined as a file offset and a byte size. -The `FileFrame` instance manages an internal circular buffer -with two essential guarantees: - -1. The most recently requested frame is present in the buffer -and contiguous in memory. -1. Any extra data after the frame that may have been read from -the external unit will be preserved, so that it's safe to -read from a socket, pipe, or tape and not have to worry about -repositioning and rereading. - -In end-of-file situations, it's possible that a request to read -a frame may come up short. - -As a CRTP class template, `FileFrame` accesses the raw filesystem -facilities it needs from `*this`. - -`ExternalFileUnit` ----------------- -This class mixes in `ConnectionState`, `OpenFile`, and -`FileFrame` to represent the state of an open -(or soon to be opened) external file descriptor as a Fortran -I/O unit. -It has the contextual APIs required to serve as a template actual -argument to `FormatControl`. -And it contains a `std::variant<>` suitable for holding the -state of the active I/O statement in progress on the unit -(see below). - -`ExternalFileUnit` instances reside in a `Map` that is allocated -as a static variable and indexed by Fortran unit number. -Static member functions `LookUp()`, `LookUpOrCrash()`, and `LookUpOrCreate()` -probe the map to convert Fortran `UNIT=` numbers from I/O statements -into references to active units. - -`IoStatementBase` ---------------- -The subclasses of `IoStatementBase` each encapsulate and maintain -the state of one active Fortran I/O statement across the several -I/O runtime library API function calls it may comprise. -The subclasses handle the distinctions between internal vs. external I/O, -formatted vs. list-directed vs. unformatted I/O, input vs. output, -and so on. - -`IoStatementBase` inherits default `FORMAT` processing callbacks and -an `IoErrorHandler`. -Each of the `IoStatementBase` classes that pertain to formatted I/O -support the contextual callback interfaces needed by `FormatControl`, -overriding the default callbacks of the base class, which crash if -called inappropriately (e.g., if a `CLOSE` statement somehow -passes a data item from an *io-list*). - -The lifetimes of these subclasses' instances each begin with a user -program call to an I/O API routine with a name like `BeginExternalListOutput()` -and persist until `EndIoStatement()` is called. - -To reduce dynamic memory allocation, *external* I/O statements allocate -their per-statement state class instances in space reserved in the -`ExternalFileUnit` instance. -Internal I/O statements currently use dynamic allocation, but -the I/O API supports a means whereby the code generated for the Fortran -program may supply stack space to the I/O runtime support library -for this purpose. - -`IoStatementState` ----------------- -F18's Fortran I/O runtime support library defines and implements an API -that uses a sequence of function calls to implement each Fortran I/O -statement. -The state of each I/O statement in progress is maintained in some -subclass of `IoStatementBase`, as noted above. -The purpose of `IoStatementState` is to provide generic access -to the specific state classes without recourse to C++ `virtual` -functions or function pointers, language features that may not be -available to us in some important execution environments. -`IoStatementState` comprises a `std::variant<>` of wrapped references -to the various possibilities, and uses `std::visit()` to -access them as needed by the I/O API calls that process each specifier -in the I/O *control-list* and each item in the *io-list*. - -Pointers to `IoStatementState` instances are the `Cookie` type returned -in the I/O API for `Begin...` I/O statement calls, passed back for -the *control-list* specifiers and *io-list* data items, and consumed -by the `EndIoStatement()` call at the end of the statement. - -Storage for `IoStatementState` is reserved in `ExternalFileUnit` for -external I/O units, and in the various final subclasses for internal -I/O statement states otherwise. - -Since Fortran permits a `CLOSE` statement to reference a nonexistent -unit, the library has to treat that (expected to be rare) situation -as a weird variation of internal I/O since there's no `ExternalFileUnit` -available to hold its `IoStatementBase` subclass or `IoStatementState`. - -A Narrative Overview Of `PRINT *, 'HELLO, WORLD'` -================================================= -1. When the compiled Fortran program begins execution at the `main()` -entry point exported from its main program, it calls `ProgramStart()` -with its arguments and environment. -1. The generated code calls `BeginExternalListOutput()` to -start the sequence of calls that implement the `PRINT` statement. -Since the Fortran runtime I/O library has not yet been used in -this process, its data structures are initialized on this -first call, and Fortran I/O units 5 and 6 are connected with -the stadard input and output file descriptors (respectively). -The default unit code is converted to 6 and passed to -`ExternalFileUnit::LookUpOrCrash()`, which returns a reference to -unit 6's instance. -1. We check that the unit was opened for formatted I/O. -1. `ExternalFileUnit::BeginIoStatement<>()` is called to initialize -an instance of `ExternalListIoStatementState` in the unit, -point to it with an `IoStatementState`, and return a reference to -that object whose address will be the `Cookie` for this statement. -1. The generated code calls `OutputAscii()` with that cookie and the -address and length of the string. -1. `OutputAscii()` confirms that the cookie corresponds to an output -statement and determines that it's list-directed. -1. `ListDirectedStatementState::EmitLeadingSpaceOrAdvance()` -emits the required initial space on the new current output record -by calling `IoStatementState::GetConnectionState()` to locate -the connection state, determining from the record position state -that the space is necessary, and calling `IoStatementState::Emit()` -to cough it out. That call is redirected to `ExternalFileUnit::Emit()`, -which calls `FileFrame::WriteFrame()` to extend -the frame of the current record and then `memcpy()` to fill its -first byte with the space. -1. Back in `OutputAscii()`, the mutable modes and connection state -of the `IoStatementState` are queried to see whether we're in an -`WRITE(UNIT=,FMT=,DELIM=)` statement with a delimited specifier. -If we were, the library would emit the appropriate quote marks, -double up any instances of that character in the text, and split the -text over multiple records if it's long. -1. But we don't have a delimiter, so `OutputAscii()` just carves -up the text into record-sized chunks and emits them. There's just -one chunk for our short `CHARACTER` string value in this example. -It's passed to `IoStatementState::Emit()`, which (as above) is -redirected to `ExternalFileUnit::Emit()`, which interacts with the -frame to extend the frame and `memcpy` data into the buffer. -1. A flag is set in `ListDirectedStatementState` to remember -that the last item emitted in this list-directed output statement -was an undelimited `CHARACTER` value, so that if the next item is -also an undelimited `CHARACTER`, no interposing space will be emitted -between them. -1. `OutputAscii()` return `true` to its caller. -1. The generated code calls `EndIoStatement()`, which is redirected to -`ExternalIoStatementState`'s override of that function. -As this is not a non-advancing I/O statement, `ExternalFileUnit::AdvanceRecord()` -is called to end the record. Since this is a sequential formatted -file, a newline is emitted. -1. If unit 6 is connected to a terminal, the buffer is flushed. -`FileFrame::Flush()` drives `ExternalFileUnit::Write()` -to push out the data in maximal contiguous chunks, dealing with any -short writes that might occur, and collecting I/O errors along the way. -This statement has no `ERR=` label or `IOSTAT=` specifier, so errors -arriving at `IoErrorHandler::SignalErrno()` will cause an immediate -crash. -1. `ExternalIoStatementBase::EndIoStatement()` is called. -It gets the final `IOSTAT=` value from `IoStatementBase::EndIoStatement()`, -tells the `ExternalFileUnit` that no I/O statement remains active, and -returns the I/O status value back to the program. -1. Eventually, the program calls `ProgramEndStatement()`, which -calls `ExternalFileUnit::CloseAll()`, which flushes and closes all -open files. If the standard output were not a terminal, the output -would be written now with the same sequence of calls as above. -1. `exit(EXIT_SUCCESS)`. diff --git a/flang/documentation/ImplementingASemanticCheck.md b/flang/documentation/ImplementingASemanticCheck.md deleted file mode 100644 --- a/flang/documentation/ImplementingASemanticCheck.md +++ /dev/null @@ -1,832 +0,0 @@ - -# Introduction -I recently added a semantic check to the f18 compiler front end. This document -describes my thought process and the resulting implementation. - -For more information about the compiler, start with the -[compiler overview](Overview.md). - -# Problem definition - -In the 2018 Fortran standard, section 11.1.7.4.3, paragraph 2, states that: - -``` -Except for the incrementation of the DO variable that occurs in step (3), the DO variable -shall neither be redefined nor become undefined while the DO construct is active. -``` -One of the ways that DO variables might be redefined is if they are passed to -functions with dummy arguments whose `INTENT` is `INTENT(OUT)` or -`INTENT(INOUT)`. I implemented this semantic check. Specifically, I changed -the compiler to emit an error message if an active DO variable was passed to a -dummy argument of a FUNCTION with INTENT(OUT). Similarly, I had the compiler -emit a warning if an active DO variable was passed to a dummy argument with -INTENT(INOUT). Previously, I had implemented similar checks for SUBROUTINE -calls. - -# Creating a test - -My first step was to create a test case to cause the problem. I called it testfun.f90 and used it to check the behavior of other Fortran compilers. Here's the initial version: - -```fortran - subroutine s() - Integer :: ivar, jvar - - do ivar = 1, 10 - jvar = intentOutFunc(ivar) ! Error since ivar is a DO variable - end do - - contains - function intentOutFunc(dummyArg) - integer, intent(out) :: dummyArg - integer :: intentOutFunc - - dummyArg = 216 - end function intentOutFunc - end subroutine s -``` - -I verified that other Fortran compilers produced an error message at the point -of the call to `intentOutFunc()`: - -```fortran - jvar = intentOutFunc(ivar) ! Error since ivar is a DO variable -``` - - -I also used this program to produce a parse tree for the program using the command: -```bash - f18 -fdebug-dump-parse-tree -fparse-only testfun.f90 -``` - -Here's the relevant fragment of the parse tree produced by the compiler: - -``` -| | ExecutionPartConstruct -> ExecutableConstruct -> DoConstruct -| | | NonLabelDoStmt -| | | | LoopControl -> LoopBounds -| | | | | Scalar -> Name = 'ivar' -| | | | | Scalar -> Expr = '1_4' -| | | | | | LiteralConstant -> IntLiteralConstant = '1' -| | | | | Scalar -> Expr = '10_4' -| | | | | | LiteralConstant -> IntLiteralConstant = '10' -| | | Block -| | | | ExecutionPartConstruct -> ExecutableConstruct -> ActionStmt -> AssignmentStmt = 'jvar=intentoutfunc(ivar)' -| | | | | Variable -> Designator -> DataRef -> Name = 'jvar' -| | | | | Expr = 'intentoutfunc(ivar)' -| | | | | | FunctionReference -> Call -| | | | | | | ProcedureDesignator -> Name = 'intentoutfunc' -| | | | | | | ActualArgSpec -| | | | | | | | ActualArg -> Expr = 'ivar' -| | | | | | | | | Designator -> DataRef -> Name = 'ivar' -| | | EndDoStmt -> -``` - -Note that this fragment of the tree only shows four `parser::Expr` nodes, -but the full parse tree also contained a fifth `parser::Expr` node for the -constant 216 in the statement: - -```fortran - dummyArg = 216 -``` -# Analysis and implementation planning - -I then considered what I needed to do. I needed to detect situations where an -active DO variable was passed to a dummy argument with `INTENT(OUT)` or -`INTENT(INOUT)`. Once I detected such a situation, I needed to produce a -message that highlighted the erroneous source code. - -## Deciding where to add the code to the compiler -This new semantic check would depend on several types of information -- the -parse tree, source code location information, symbols, and expressions. Thus I -needed to put my new code in a place in the compiler after the parse tree had -been created, name resolution had already happened, and expression semantic -checking had already taken place. - -Most semantic checks for statements are implemented by walking the parse tree -and performing analysis on the nodes they visit. My plan was to use this -method. The infrastructure for walking the parse tree for statement semantic -checking is implemented in the files `lib/Semantics/semantics.cpp`. -Here's a fragment of the declaration of the framework's parse tree visitor from -`lib/Semantics/semantics.cpp`: - -```C++ - // A parse tree visitor that calls Enter/Leave functions from each checker - // class C supplied as template parameters. Enter is called before the node's - // children are visited, Leave is called after. No two checkers may have the - // same Enter or Leave function. Each checker must be constructible from - // SemanticsContext and have BaseChecker as a virtual base class. - template class SemanticsVisitor : public virtual C... { - public: - using C::Enter...; - using C::Leave...; - using BaseChecker::Enter; - using BaseChecker::Leave; - SemanticsVisitor(SemanticsContext &context) - : C{context}..., context_{context} {} - ... - -``` - -Since FUNCTION calls are a kind of expression, I was planning to base my -implementation on the contents of `parser::Expr` nodes. I would need to define -either an `Enter()` or `Leave()` function whose parameter was a `parser::Expr` -node. Here's the declaration I put into `lib/Semantics/check-do.h`: - -```C++ - void Leave(const parser::Expr &); -``` -The `Enter()` functions get called at the time the node is first visited -- -that is, before its children. The `Leave()` function gets called after the -children are visited. For my check the visitation order didn't matter, so I -arbitrarily chose to implement the `Leave()` function to visit the parse tree -node. - -Since my semantic check was focused on DO CONCURRENT statements, I added it to -the file `lib/Semantics/check-do.cpp` where most of the semantic checking for -DO statements already lived. - -## Taking advantage of prior work -When implementing a similar check for SUBROUTINE calls, I created a utility -functions in `lib/Semantics/semantics.cpp` to emit messages if -a symbol corresponding to an active DO variable was being potentially modified: - -```C++ - void WarnDoVarRedefine(const parser::CharBlock &location, const Symbol &var); - void CheckDoVarRedefine(const parser::CharBlock &location, const Symbol &var); -``` - -The first function is intended for dummy arguments of `INTENT(INOUT)` and -the second for `INTENT(OUT)`. - -Thus I needed three pieces of -information -- -1. the source location of the erroneous text, -2. the `INTENT` of the associated dummy argument, and -3. the relevant symbol passed as the actual argument. - -The first and third are needed since they're required to call the utility -functions. The second is needed to determine whether to call them. - -## Finding the source location -The source code location information that I'd need for the error message must -come from the parse tree. I looked in the file -`include/flang/Parser/parse-tree.h` and determined that a `struct Expr` -contained source location information since it had the field `CharBlock -source`. Thus, if I visited a `parser::Expr` node, I could get the source -location information for the associated expression. - -## Determining the `INTENT` -I knew that I could find the `INTENT` of the dummy argument associated with the -actual argument from the function called `dummyIntent()` in the class -`evaluate::ActualArgument` in the file `include/flang/Evaluate/call.h`. So -if I could find an `evaluate::ActualArgument` in an expression, I could - determine the `INTENT` of the associated dummy argument. I knew that it was - valid to call `dummyIntent()` because the data on which `dummyIntent()` - depends is established during semantic processing for expressions, and the - semantic processing for expressions happens before semantic checking for DO - constructs. - -In my prior work on checking the INTENT of arguments for SUBROUTINE calls, -the parse tree held a node for the call (a `parser::CallStmt`) that contained -an `evaluate::ProcedureRef` node. -```C++ - struct CallStmt { - WRAPPER_CLASS_BOILERPLATE(CallStmt, Call); - mutable std::unique_ptr> - typedCall; // filled by semantics - }; -``` -The `evaluate::ProcedureRef` contains a list of `evaluate::ActualArgument` -nodes. I could then find the INTENT of a dummy argument from the -`evaluate::ActualArgument` node. - -For a FUNCTION call, though, there is no similar way to get from a parse tree -node to an `evaluate::ProcedureRef` node. But I knew that there was an -existing framework used in DO construct semantic checking that traversed an -`evaluate::Expr` node collecting `semantics::Symbol` nodes. I guessed that I'd -be able to use a similar framework to traverse an `evaluate::Expr` node to -find all of the `evaluate::ActualArgument` nodes. - -Note that the compiler has multiple types called `Expr`. One is in the -`parser` namespace. `parser::Expr` is defined in the file -`include/flang/Parser/parse-tree.h`. It represents a parsed expression that -maps directly to the source code and has fields that specify any operators in -the expression, the operands, and the source position of the expression. - -Additionally, in the namespace `evaluate`, there are `evaluate::Expr` -template classes defined in the file `include/flang/Evaluate/expression.h`. -These are parameterized over the various types of Fortran and constitute a -suite of strongly-typed representations of valid Fortran expressions of type -`T` that have been fully elaborated with conversion operations and subjected to -constant folding. After an expression has undergone semantic analysis, the -field `typedExpr` in the `parser::Expr` node is filled in with a pointer that -owns an instance of `evaluate::Expr`, the most general representation -of an analyzed expression. - -All of the declarations associated with both FUNCTION and SUBROUTINE calls are -in `include/flang/Evaluate/call.h`. An `evaluate::FunctionRef` inherits from -an `evaluate::ProcedureRef` which contains the list of -`evaluate::ActualArgument` nodes. But the relationship between an -`evaluate::FunctionRef` node and its associated arguments is not relevant. I -only needed to find the `evaluate::ActualArgument` nodes in an expression. -They hold all of the information I needed. - -So my plan was to start with the `parser::Expr` node and extract its -associated `evaluate::Expr` field. I would then traverse the -`evaluate::Expr` tree collecting all of the `evaluate::ActualArgument` -nodes. I would look at each of these nodes to determine the `INTENT` of -the associated dummy argument. - -This combination of the traversal framework and `dummyIntent()` would give -me the `INTENT` of all of the dummy arguments in a FUNCTION call. Thus, I -would have the second piece of information I needed. - -## Determining if the actual argument is a variable -I also guessed that I could determine if the `evaluate::ActualArgument` -consisted of a variable. - -Once I had a symbol for the variable, I could call one of the functions: -```C++ - void WarnDoVarRedefine(const parser::CharBlock &, const Symbol &); - void CheckDoVarRedefine(const parser::CharBlock &, const Symbol &); -``` -to emit the messages. - -If my plans worked out, this would give me the three pieces of information I -needed -- the source location of the erroneous text, the `INTENT` of the dummy -argument, and a symbol that I could use to determine whether the actual -argument was an active DO variable. - -# Implementation - -## Adding a parse tree visitor -I started my implementation by adding a visitor for `parser::Expr` nodes. -Since this analysis is part of DO construct checking, I did this in -`lib/Semantics/check-do.cpp`. I added a print statement to the visitor to -verify that my new code was actually getting executed. - -In `lib/Semantics/check-do.h`, I added the declaration for the visitor: - -```C++ - void Leave(const parser::Expr &); -``` - -In `lib/Semantics/check-do.cpp`, I added an (almost empty) implementation: - -```C++ - void DoChecker::Leave(const parser::Expr &) { - std::cout << "In Leave for parser::Expr\n"; - } -``` - -I then built the compiler with these changes and ran it on my test program. -This time, I made sure to invoke semantic checking. Here's the command I used: -```bash - f18 -fdebug-resolve-names -fdebug-dump-parse-tree -funparse-with-symbols testfun.f90 -``` - -This produced the output: - -``` - In Leave for parser::Expr - In Leave for parser::Expr - In Leave for parser::Expr - In Leave for parser::Expr - In Leave for parser::Expr -``` - -This made sense since the parse tree contained five `parser::Expr` nodes. -So far, so good. Note that a `parse::Expr` node has a field with the -source position of the associated expression (`CharBlock source`). So I -now had one of the three pieces of information needed to detect and report -errors. - -## Collecting the actual arguments -To get the `INTENT` of the dummy arguments and the `semantics::Symbol` associated with the -actual argument, I needed to find all of the actual arguments embedded in an -expression that contained a FUNCTION call. So my next step was to write the -framework to walk the `evaluate::Expr` to gather all of the -`evaluate::ActualArgument` nodes. The code that I planned to model it on -was the existing infrastructure that collected all of the `semantics::Symbol` nodes from an -`evaluate::Expr`. I found this implementation in -`lib/Evaluate/tools.cpp`: - -```C++ - struct CollectSymbolsHelper - : public SetTraverse { - using Base = SetTraverse; - CollectSymbolsHelper() : Base{*this} {} - using Base::operator(); - semantics::SymbolSet operator()(const Symbol &symbol) const { - return {symbol}; - } - }; - template semantics::SymbolSet CollectSymbols(const A &x) { - return CollectSymbolsHelper{}(x); - } -``` - -Note that the `CollectSymbols()` function returns a `semantics::Symbolset`, -which is declared in `include/flang/Semantics/symbol.h`: - -```C++ - using SymbolSet = std::set; -``` - -This infrastructure yields a collection based on `std::set<>`. Using an -`std::set<>` means that if the same object is inserted twice, the -collection only gets one copy. This was the behavior that I wanted. - -Here's a sample invocation of `CollectSymbols()` that I found: -```C++ - if (const auto *expr{GetExpr(parsedExpr)}) { - for (const Symbol &symbol : evaluate::CollectSymbols(*expr)) { -``` - -I noted that a `SymbolSet` did not actually contain an -`std::set`. This wasn't surprising since we don't want to put the -full `semantics::Symbol` objects into the set. Ideally, we would be able to create an -`std::set` (a set of C++ references to symbols). But C++ doesn't -support sets that contain references. This limitation is part of the rationale -for the f18 implementation of type `common::Reference`, which is defined in - `include/flang/Common/reference.h`. - -`SymbolRef`, the specialization of the template `common::Reference` for -`semantics::Symbol`, is declared in the file -`include/flang/Semantics/symbol.h`: - -```C++ - using SymbolRef = common::Reference; -``` - -So to implement something that would collect `evaluate::ActualArgument` -nodes from an `evaluate::Expr`, I first defined the required types -`ActualArgumentRef` and `ActualArgumentSet`. Since these are being -used exclusively for DO construct semantic checking (currently), I put their -definitions into `lib/Semantics/check-do.cpp`: - - -```C++ - namespace Fortran::evaluate { - using ActualArgumentRef = common::Reference; - } - - - using ActualArgumentSet = std::set; -``` - -Since `ActualArgument` is in the namespace `evaluate`, I put the -definition for `ActualArgumentRef` in that namespace, too. - -I then modeled the code to create an `ActualArgumentSet` after the code to -collect a `SymbolSet` and put it into `lib/Semantics/check-do.cpp`: - - -```C++ - struct CollectActualArgumentsHelper - : public evaluate::SetTraverse { - using Base = SetTraverse; - CollectActualArgumentsHelper() : Base{*this} {} - using Base::operator(); - ActualArgumentSet operator()(const evaluate::ActualArgument &arg) const { - return ActualArgumentSet{arg}; - } - }; - - template ActualArgumentSet CollectActualArguments(const A &x) { - return CollectActualArgumentsHelper{}(x); - } - - template ActualArgumentSet CollectActualArguments(const SomeExpr &); -``` - -Unfortunately, when I tried to build this code, I got an error message saying -`std::set` requires the `<` operator to be defined for its contents. -To fix this, I added a definition for `<`. I didn't care how `<` was -defined, so I just used the address of the object: - -```C++ - inline bool operator<(ActualArgumentRef x, ActualArgumentRef y) { - return &*x < &*y; - } -``` - -I was surprised when this did not make the error message saying that I needed -the `<` operator go away. Eventually, I figured out that the definition of -the `<` operator needed to be in the `evaluate` namespace. Once I put -it there, everything compiled successfully. Here's the code that worked: - -```C++ - namespace Fortran::evaluate { - using ActualArgumentRef = common::Reference; - - inline bool operator<(ActualArgumentRef x, ActualArgumentRef y) { - return &*x < &*y; - } - } -``` - -I then modified my visitor for the parser::Expr to invoke my new collection -framework. To verify that it was actually doing something, I printed out the -number of `evaluate::ActualArgument` nodes that it collected. Note the -call to `GetExpr()` in the invocation of `CollectActualArguments()`. I -modeled this on similar code that collected a `SymbolSet` described above: - -```C++ - void DoChecker::Leave(const parser::Expr &parsedExpr) { - std::cout << "In Leave for parser::Expr\n"; - ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; - std::cout << "Number of arguments: " << argSet.size() << "\n"; - } -``` - -I compiled and tested this code on my little test program. Here's the output that I got: -``` - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 1 - In Leave for parser::Expr - Number of arguments: 0 -``` - -So most of the `parser::Expr`nodes contained no actual arguments, but the -fourth expression in the parse tree walk contained a single argument. This may -seem wrong since the third `parser::Expr` node in the file contains the -`FunctionReference` node along with the arguments that we're gathering. -But since the tree walk function is being called upon leaving a -`parser::Expr` node, the function visits the `parser::Expr` node -associated with the `parser::ActualArg` node before it visits the -`parser::Expr` node associated with the `parser::FunctionReference` -node. - -So far, so good. - -## Finding the `INTENT` of the dummy argument -I now wanted to find the `INTENT` of the dummy argument associated with the -arguments in the set. As mentioned earlier, the type -`evaluate::ActualArgument` has a member function called `dummyIntent()` -that gives this value. So I augmented my code to print out the `INTENT`: - -```C++ - void DoChecker::Leave(const parser::Expr &parsedExpr) { - std::cout << "In Leave for parser::Expr\n"; - ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; - std::cout << "Number of arguments: " << argSet.size() << "\n"; - for (const evaluate::ActualArgumentRef &argRef : argSet) { - common::Intent intent{argRef->dummyIntent()}; - switch (intent) { - case common::Intent::In: std::cout << "INTENT(IN)\n"; break; - case common::Intent::Out: std::cout << "INTENT(OUT)\n"; break; - case common::Intent::InOut: std::cout << "INTENT(INOUT)\n"; break; - default: std::cout << "default INTENT\n"; - } - } - } -``` - -I then rebuilt my compiler and ran it on my test case. This produced the following output: - -``` - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 1 - INTENT(OUT) - In Leave for parser::Expr - Number of arguments: 0 -``` - -I then modified my test case to convince myself that I was getting the correct -`INTENT` for `IN`, `INOUT`, and default cases. - -So far, so good. - -## Finding the symbols for arguments that are variables -The third and last piece of information I needed was to determine if a variable -was being passed as an actual argument. In such cases, I wanted to get the -symbol table node (`semantics::Symbol`) for the variable. My starting point was the -`evaluate::ActualArgument` node. - -I was unsure of how to do this, so I browsed through existing code to look for -how it treated `evaluate::ActualArgument` objects. Since most of the code that deals with the `evaluate` namespace is in the lib/Evaluate directory, I looked there. I ran `grep` on all of the `.cpp` files looking for -uses of `ActualArgument`. One of the first hits I got was in `lib/Evaluate/call.cpp` in the definition of `ActualArgument::GetType()`: - -```C++ -std::optional ActualArgument::GetType() const { - if (const Expr *expr{UnwrapExpr()}) { - return expr->GetType(); - } else if (std::holds_alternative(u_)) { - return DynamicType::AssumedType(); - } else { - return std::nullopt; - } -} -``` - -I noted the call to `UnwrapExpr()` that yielded a value of -`Expr`. So I guessed that I could use this member function to -get an `evaluate::Expr` on which I could perform further analysis. - -I also knew that the header file `include/flang/Evaluate/tools.h` held many -utility functions for dealing with `evaluate::Expr` objects. I was hoping to -find something that would determine if an `evaluate::Expr` was a variable. So -I searched for `IsVariable` and got a hit immediately. -```C++ - template bool IsVariable(const A &x) { - if (auto known{IsVariableHelper{}(x)}) { - return *known; - } else { - return false; - } - } -``` - -But I actually needed more than just the knowledge that an `evaluate::Expr` was -a variable. I needed the `semantics::Symbol` associated with the variable. So -I searched in `include/flang/Evaluate/tools.h` for functions that returned a -`semantics::Symbol`. I found the following: - -```C++ -// If an expression is simply a whole symbol data designator, -// extract and return that symbol, else null. -template const Symbol *UnwrapWholeSymbolDataRef(const A &x) { - if (auto dataRef{ExtractDataRef(x)}) { - if (const SymbolRef * p{std::get_if(&dataRef->u)}) { - return &p->get(); - } - } - return nullptr; -} -``` - -This was exactly what I wanted. DO variables must be whole symbols. So I -could try to extract a whole `semantics::Symbol` from the `evaluate::Expr` in my -`evaluate::ActualArgument`. If this extraction resulted in a `semantics::Symbol` -that wasn't a `nullptr`, I could then conclude if it was a variable that I -could pass to existing functions that would determine if it was an active DO -variable. - -I then modified the compiler to perform the analysis that I'd guessed would -work: - -```C++ - void DoChecker::Leave(const parser::Expr &parsedExpr) { - std::cout << "In Leave for parser::Expr\n"; - ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; - std::cout << "Number of arguments: " << argSet.size() << "\n"; - for (const evaluate::ActualArgumentRef &argRef : argSet) { - if (const SomeExpr * argExpr{argRef->UnwrapExpr()}) { - std::cout << "Got an unwrapped Expr\n"; - if (const Symbol * var{evaluate::UnwrapWholeSymbolDataRef(*argExpr)}) { - std::cout << "Found a whole variable: " << *var << "\n"; - } - } - common::Intent intent{argRef->dummyIntent()}; - switch (intent) { - case common::Intent::In: std::cout << "INTENT(IN)\n"; break; - case common::Intent::Out: std::cout << "INTENT(OUT)\n"; break; - case common::Intent::InOut: std::cout << "INTENT(INOUT)\n"; break; - default: std::cout << "default INTENT\n"; - } - } - } -``` - -Note the line that prints out the symbol table entry for the variable: - -```C++ - std::cout << "Found a whole variable: " << *var << "\n"; -``` - -The compiler defines the "<<" operator for `semantics::Symbol`, which is handy -for analyzing the compiler's behavior. - -Here's the result of running the modified compiler on my Fortran test case: - -``` - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 1 - Got an unwrapped Expr - Found a whole variable: ivar: ObjectEntity type: INTEGER(4) - INTENT(OUT) - In Leave for parser::Expr - Number of arguments: 0 -``` - -Sweet. - -## Emitting the messages -At this point, using the source location information from the original -`parser::Expr`, I had enough information to plug into the exiting -interfaces for emitting messages for active DO variables. I modified the -compiler code accordingly: - - -```C++ - void DoChecker::Leave(const parser::Expr &parsedExpr) { - std::cout << "In Leave for parser::Expr\n"; - ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; - std::cout << "Number of arguments: " << argSet.size() << "\n"; - for (const evaluate::ActualArgumentRef &argRef : argSet) { - if (const SomeExpr * argExpr{argRef->UnwrapExpr()}) { - std::cout << "Got an unwrapped Expr\n"; - if (const Symbol * var{evaluate::UnwrapWholeSymbolDataRef(*argExpr)}) { - std::cout << "Found a whole variable: " << *var << "\n"; - common::Intent intent{argRef->dummyIntent()}; - switch (intent) { - case common::Intent::In: std::cout << "INTENT(IN)\n"; break; - case common::Intent::Out: - std::cout << "INTENT(OUT)\n"; - context_.CheckDoVarRedefine(parsedExpr.source, *var); - break; - case common::Intent::InOut: - std::cout << "INTENT(INOUT)\n"; - context_.WarnDoVarRedefine(parsedExpr.source, *var); - break; - default: std::cout << "default INTENT\n"; - } - } - } - } - } -``` - -I then ran this code on my test case, and miraculously, got the following -output: - -``` - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 0 - In Leave for parser::Expr - Number of arguments: 1 - Got an unwrapped Expr - Found a whole variable: ivar: ObjectEntity type: INTEGER(4) - INTENT(OUT) - In Leave for parser::Expr - Number of arguments: 0 - testfun.f90:6:12: error: Cannot redefine DO variable 'ivar' - jvar = intentOutFunc(ivar) - ^^^^^^^^^^^^^^^^^^^ - testfun.f90:5:6: Enclosing DO construct - do ivar = 1, 10 - ^^^^ -``` - -Even sweeter. - -# Improving the test case -At this point, my implementation seemed to be working. But I was concerned -about the limitations of my test case. So I augmented it to include arguments -other than `INTENT(OUT)` and more complex expressions. Luckily, my -augmented test did not reveal any new problems. - -Here's the test I ended up with: - -```Fortran - subroutine s() - - Integer :: ivar, jvar - - ! This one is OK - do ivar = 1, 10 - jvar = intentInFunc(ivar) - end do - - ! Error for passing a DO variable to an INTENT(OUT) dummy - do ivar = 1, 10 - jvar = intentOutFunc(ivar) - end do - - ! Error for passing a DO variable to an INTENT(OUT) dummy, more complex - ! expression - do ivar = 1, 10 - jvar = 83 + intentInFunc(intentOutFunc(ivar)) - end do - - ! Warning for passing a DO variable to an INTENT(INOUT) dummy - do ivar = 1, 10 - jvar = intentInOutFunc(ivar) - end do - - contains - function intentInFunc(dummyArg) - integer, intent(in) :: dummyArg - integer :: intentInFunc - - intentInFunc = 343 - end function intentInFunc - - function intentOutFunc(dummyArg) - integer, intent(out) :: dummyArg - integer :: intentOutFunc - - dummyArg = 216 - intentOutFunc = 343 - end function intentOutFunc - - function intentInOutFunc(dummyArg) - integer, intent(inout) :: dummyArg - integer :: intentInOutFunc - - dummyArg = 216 - intentInOutFunc = 343 - end function intentInOutFunc - - end subroutine s -``` - -# Submitting the pull request -At this point, my implementation seemed functionally complete, so I stripped out all of the debug statements, ran `clang-format` on it and reviewed it -to make sure that the names were clear. Here's what I ended up with: - -```C++ - void DoChecker::Leave(const parser::Expr &parsedExpr) { - ActualArgumentSet argSet{CollectActualArguments(GetExpr(parsedExpr))}; - for (const evaluate::ActualArgumentRef &argRef : argSet) { - if (const SomeExpr * argExpr{argRef->UnwrapExpr()}) { - if (const Symbol * var{evaluate::UnwrapWholeSymbolDataRef(*argExpr)}) { - common::Intent intent{argRef->dummyIntent()}; - switch (intent) { - case common::Intent::Out: - context_.CheckDoVarRedefine(parsedExpr.source, *var); - break; - case common::Intent::InOut: - context_.WarnDoVarRedefine(parsedExpr.source, *var); - break; - default:; // INTENT(IN) or default intent - } - } - } - } - } -``` - -I then created a pull request to get review comments. - -# Responding to pull request comments -I got feedback suggesting that I use an `if` statement rather than a -`case` statement. Another comment reminded me that I should look at the -code I'd previously writted to do a similar check for SUBROUTINE calls to see -if there was an opportunity to share code. This examination resulted in - converting my existing code to the following pair of functions: - - -```C++ - static void CheckIfArgIsDoVar(const evaluate::ActualArgument &arg, - const parser::CharBlock location, SemanticsContext &context) { - common::Intent intent{arg.dummyIntent()}; - if (intent == common::Intent::Out || intent == common::Intent::InOut) { - if (const SomeExpr * argExpr{arg.UnwrapExpr()}) { - if (const Symbol * var{evaluate::UnwrapWholeSymbolDataRef(*argExpr)}) { - if (intent == common::Intent::Out) { - context.CheckDoVarRedefine(location, *var); - } else { - context.WarnDoVarRedefine(location, *var); // INTENT(INOUT) - } - } - } - } - } - - void DoChecker::Leave(const parser::Expr &parsedExpr) { - if (const SomeExpr * expr{GetExpr(parsedExpr)}) { - ActualArgumentSet argSet{CollectActualArguments(*expr)}; - for (const evaluate::ActualArgumentRef &argRef : argSet) { - CheckIfArgIsDoVar(*argRef, parsedExpr.source, context_); - } - } - } -``` - -The function `CheckIfArgIsDoVar()` was shared with the checks for DO -variables being passed to SUBROUTINE calls. - -At this point, my pull request was approved, and I merged it and deleted the -associated branch. diff --git a/flang/documentation/Intrinsics.md b/flang/documentation/Intrinsics.md deleted file mode 100644 --- a/flang/documentation/Intrinsics.md +++ /dev/null @@ -1,791 +0,0 @@ - - -# A categorization of standard (2018) and extended Fortran intrinsic procedures - -This note attempts to group the intrinsic procedures of Fortran into categories -of functions or subroutines with similar interfaces as an aid to -comprehension beyond that which might be gained from the standard's -alphabetical list. - -A brief status of intrinsic procedure support in f18 is also given at the end. - -Few procedures are actually described here apart from their interfaces; see the -Fortran 2018 standard (section 16) for the complete story. - -Intrinsic modules are not covered here. - -## General rules - -1. The value of any intrinsic function's `KIND` actual argument, if present, - must be a scalar constant integer expression, of any kind, whose value - resolves to some supported kind of the function's result type. - If optional and absent, the kind of the function's result is - either the default kind of that category or to the kind of an argument - (e.g., as in `AINT`). -1. Procedures are summarized with a non-Fortran syntax for brevity. - Wherever a function has a short definition, it appears after an - equal sign as if it were a statement function. Any functions referenced - in these short summaries are intrinsic. -1. Unless stated otherwise, an actual argument may have any supported kind - of a particular intrinsic type. Sometimes a pattern variable - can appear in a description (e.g., `REAL(k)`) when the kind of an - actual argument's type must match the kind of another argument, or - determines the kind type parameter of the function result. -1. When an intrinsic type name appears without a kind (e.g., `REAL`), - it refers to the default kind of that type. Sometimes the word - `default` will appear for clarity. -1. The names of the dummy arguments actually matter because they can - be used as keywords for actual arguments. -1. All standard intrinsic functions are pure, even when not elemental. -1. Assumed-rank arguments may not appear as actual arguments unless - expressly permitted. -1. When an argument is described with a default value, e.g. `KIND=KIND(0)`, - it is an optional argument. Optional arguments without defaults, - e.g. `DIM` on many transformationals, are wrapped in `[]` brackets - as in the Fortran standard. When an intrinsic has optional arguments - with and without default values, the arguments with default values - may appear within the brackets to preserve the order of arguments - (e.g., `COUNT`). - -# Elemental intrinsic functions - -Pure elemental semantics apply to these functions, to wit: when one or more of -the actual arguments are arrays, the arguments must be conformable, and -the result is also an array. -Scalar arguments are expanded when the arguments are not all scalars. - -## Elemental intrinsic functions that may have unrestricted specific procedures - -When an elemental intrinsic function is documented here as having an -_unrestricted specific name_, that name may be passed as an actual -argument, used as the target of a procedure pointer, appear in -a generic interface, and be otherwise used as if it were an external -procedure. -An `INTRINSIC` statement or attribute may have to be applied to an -unrestricted specific name to enable such usage. - -When a name is being used as a specific procedure for any purpose other -than that of a called function, the specific instance of the function -that accepts and returns values of the default kinds of the intrinsic -types is used. -A Fortran `INTERFACE` could be written to define each of -these unrestricted specific intrinsic function names. - -Calls to dummy arguments and procedure pointers that correspond to these -specific names must pass only scalar actual argument values. - -No other intrinsic function name can be passed as an actual argument, -used as a pointer target, appear in a generic interface, or be otherwise -used except as the name of a called function. -Some of these _restricted specific intrinsic functions_, e.g. `FLOAT`, -provide a means for invoking a corresponding generic (`REAL` in the case of `FLOAT`) -with forced argument and result kinds. -Others, viz. `CHAR`, `ICHAR`, `INT`, `REAL`, and the lexical comparisons like `LGE`, -have the same name as their generic functions, and it is not clear what purpose -is accomplished by the standard by defining them as specific functions. - -### Trigonometric elemental intrinsic functions, generic and (mostly) specific -All of these functions can be used as unrestricted specific names. - -``` -ACOS(REAL(k) X) -> REAL(k) -ASIN(REAL(k) X) -> REAL(k) -ATAN(REAL(k) X) -> REAL(k) -ATAN(REAL(k) Y, REAL(k) X) -> REAL(k) = ATAN2(Y, X) -ATAN2(REAL(k) Y, REAL(k) X) -> REAL(k) -COS(REAL(k) X) -> REAL(k) -COSH(REAL(k) X) -> REAL(k) -SIN(REAL(k) X) -> REAL(k) -SINH(REAL(k) X) -> REAL(k) -TAN(REAL(k) X) -> REAL(k) -TANH(REAL(k) X) -> REAL(k) -``` - -These `COMPLEX` versions of some of those functions, and the -inverse hyperbolic functions, cannot be used as specific names. -``` -ACOS(COMPLEX(k) X) -> COMPLEX(k) -ASIN(COMPLEX(k) X) -> COMPLEX(k) -ATAN(COMPLEX(k) X) -> COMPLEX(k) -ACOSH(REAL(k) X) -> REAL(k) -ACOSH(COMPLEX(k) X) -> COMPLEX(k) -ASINH(REAL(k) X) -> REAL(k) -ASINH(COMPLEX(k) X) -> COMPLEX(k) -ATANH(REAL(k) X) -> REAL(k) -ATANH(COMPLEX(k) X) -> COMPLEX(k) -COS(COMPLEX(k) X) -> COMPLEX(k) -COSH(COMPLEX(k) X) -> COMPLEX(k) -SIN(COMPLEX(k) X) -> COMPLEX(k) -SINH(COMPLEX(k) X) -> COMPLEX(k) -TAN(COMPLEX(k) X) -> COMPLEX(k) -TANH(COMPLEX(k) X) -> COMPLEX(k) -``` - -### Non-trigonometric elemental intrinsic functions, generic and specific -These functions *can* be used as unrestricted specific names. -``` -ABS(REAL(k) A) -> REAL(k) = SIGN(A, 0.0) -AIMAG(COMPLEX(k) Z) -> REAL(k) = Z%IM -AINT(REAL(k) A, KIND=k) -> REAL(KIND) -ANINT(REAL(k) A, KIND=k) -> REAL(KIND) -CONJG(COMPLEX(k) Z) -> COMPLEX(k) = CMPLX(Z%RE, -Z%IM) -DIM(REAL(k) X, REAL(k) Y) -> REAL(k) = X-MIN(X,Y) -DPROD(default REAL X, default REAL Y) -> DOUBLE PRECISION = DBLE(X)*DBLE(Y) -EXP(REAL(k) X) -> REAL(k) -INDEX(CHARACTER(k) STRING, CHARACTER(k) SUBSTRING, LOGICAL(any) BACK=.FALSE., KIND=KIND(0)) -> INTEGER(KIND) -LEN(CHARACTER(k,n) STRING, KIND=KIND(0)) -> INTEGER(KIND) = n -LOG(REAL(k) X) -> REAL(k) -LOG10(REAL(k) X) -> REAL(k) -MOD(INTEGER(k) A, INTEGER(k) P) -> INTEGER(k) = A-P*INT(A/P) -NINT(REAL(k) A, KIND=KIND(0)) -> INTEGER(KIND) -SIGN(REAL(k) A, REAL(k) B) -> REAL(k) -SQRT(REAL(k) X) -> REAL(k) = X ** 0.5 -``` - -These variants, however *cannot* be used as specific names without recourse to an alias -from the following section: -``` -ABS(INTEGER(k) A) -> INTEGER(k) = SIGN(A, 0) -ABS(COMPLEX(k) A) -> REAL(k) = HYPOT(A%RE, A%IM) -DIM(INTEGER(k) X, INTEGER(k) Y) -> INTEGER(k) = X-MIN(X,Y) -EXP(COMPLEX(k) X) -> COMPLEX(k) -LOG(COMPLEX(k) X) -> COMPLEX(k) -MOD(REAL(k) A, REAL(k) P) -> REAL(k) = A-P*INT(A/P) -SIGN(INTEGER(k) A, INTEGER(k) B) -> INTEGER(k) -SQRT(COMPLEX(k) X) -> COMPLEX(k) -``` - -### Unrestricted specific aliases for some elemental intrinsic functions with distinct names - -``` -ALOG(REAL X) -> REAL = LOG(X) -ALOG10(REAL X) -> REAL = LOG10(X) -AMOD(REAL A, REAL P) -> REAL = MOD(A, P) -CABS(COMPLEX A) = ABS(A) -CCOS(COMPLEX X) = COS(X) -CEXP(COMPLEX A) -> COMPLEX = EXP(A) -CLOG(COMPLEX X) -> COMPLEX = LOG(X) -CSIN(COMPLEX X) -> COMPLEX = SIN(X) -CSQRT(COMPLEX X) -> COMPLEX = SQRT(X) -CTAN(COMPLEX X) -> COMPLEX = TAN(X) -DABS(DOUBLE PRECISION A) -> DOUBLE PRECISION = ABS(A) -DACOS(DOUBLE PRECISION X) -> DOUBLE PRECISION = ACOS(X) -DASIN(DOUBLE PRECISION X) -> DOUBLE PRECISION = ASIN(X) -DATAN(DOUBLE PRECISION X) -> DOUBLE PRECISION = ATAN(X) -DATAN2(DOUBLE PRECISION Y, DOUBLE PRECISION X) -> DOUBLE PRECISION = ATAN2(Y, X) -DCOS(DOUBLE PRECISION X) -> DOUBLE PRECISION = COS(X) -DCOSH(DOUBLE PRECISION X) -> DOUBLE PRECISION = COSH(X) -DDIM(DOUBLE PRECISION X, DOUBLE PRECISION Y) -> DOUBLE PRECISION = X-MIN(X,Y) -DEXP(DOUBLE PRECISION X) -> DOUBLE PRECISION = EXP(X) -DINT(DOUBLE PRECISION A) -> DOUBLE PRECISION = AINT(A) -DLOG(DOUBLE PRECISION X) -> DOUBLE PRECISION = LOG(X) -DLOG10(DOUBLE PRECISION X) -> DOUBLE PRECISION = LOG10(X) -DMOD(DOUBLE PRECISION A, DOUBLE PRECISION P) -> DOUBLE PRECISION = MOD(A, P) -DNINT(DOUBLE PRECISION A) -> DOUBLE PRECISION = ANINT(A) -DSIGN(DOUBLE PRECISION A, DOUBLE PRECISION B) -> DOUBLE PRECISION = SIGN(A, B) -DSIN(DOUBLE PRECISION X) -> DOUBLE PRECISION = SIN(X) -DSINH(DOUBLE PRECISION X) -> DOUBLE PRECISION = SINH(X) -DSQRT(DOUBLE PRECISION X) -> DOUBLE PRECISION = SQRT(X) -DTAN(DOUBLE PRECISION X) -> DOUBLE PRECISION = TAN(X) -DTANH(DOUBLE PRECISION X) -> DOUBLE PRECISION = TANH(X) -IABS(INTEGER A) -> INTEGER = ABS(A) -IDIM(INTEGER X, INTEGER Y) -> INTEGER = X-MIN(X,Y) -IDNINT(DOUBLE PRECISION A) -> INTEGER = NINT(A) -ISIGN(INTEGER A, INTEGER B) -> INTEGER = SIGN(A, B) -``` - -## Generic elemental intrinsic functions without specific names - -(No procedures after this point can be passed as actual arguments, used as -pointer targets, or appear as specific procedures in generic interfaces.) - -### Elemental conversions - -``` -ACHAR(INTEGER(k) I, KIND=KIND('')) -> CHARACTER(KIND,LEN=1) -CEILING(REAL() A, KIND=KIND(0)) -> INTEGER(KIND) -CHAR(INTEGER(any) I, KIND=KIND('')) -> CHARACTER(KIND,LEN=1) -CMPLX(COMPLEX(k) X, KIND=KIND(0.0D0)) -> COMPLEX(KIND) -CMPLX(INTEGER or REAL or BOZ X, INTEGER or REAL or BOZ Y=0, KIND=KIND((0,0))) -> COMPLEX(KIND) -DBLE(INTEGER or REAL or COMPLEX or BOZ A) = REAL(A, KIND=KIND(0.0D0)) -EXPONENT(REAL(any) X) -> default INTEGER -FLOOR(REAL(any) A, KIND=KIND(0)) -> INTEGER(KIND) -IACHAR(CHARACTER(KIND=k,LEN=1) C, KIND=KIND(0)) -> INTEGER(KIND) -ICHAR(CHARACTER(KIND=k,LEN=1) C, KIND=KIND(0)) -> INTEGER(KIND) -INT(INTEGER or REAL or COMPLEX or BOZ A, KIND=KIND(0)) -> INTEGER(KIND) -LOGICAL(LOGICAL(any) L, KIND=KIND(.TRUE.)) -> LOGICAL(KIND) -REAL(INTEGER or REAL or COMPLEX or BOZ A, KIND=KIND(0.0)) -> REAL(KIND) -``` - -### Other generic elemental intrinsic functions without specific names -N.B. `BESSEL_JN(N1, N2, X)` and `BESSEL_YN(N1, N2, X)` are categorized -below with the _transformational_ intrinsic functions. - -``` -BESSEL_J0(REAL(k) X) -> REAL(k) -BESSEL_J1(REAL(k) X) -> REAL(k) -BESSEL_JN(INTEGER(n) N, REAL(k) X) -> REAL(k) -BESSEL_Y0(REAL(k) X) -> REAL(k) -BESSEL_Y1(REAL(k) X) -> REAL(k) -BESSEL_YN(INTEGER(n) N, REAL(k) X) -> REAL(k) -ERF(REAL(k) X) -> REAL(k) -ERFC(REAL(k) X) -> REAL(k) -ERFC_SCALED(REAL(k) X) -> REAL(k) -FRACTION(REAL(k) X) -> REAL(k) -GAMMA(REAL(k) X) -> REAL(k) -HYPOT(REAL(k) X, REAL(k) Y) -> REAL(k) = SQRT(X*X+Y*Y) without spurious overflow -IMAGE_STATUS(INTEGER(any) IMAGE [, scalar TEAM_TYPE TEAM ]) -> default INTEGER -IS_IOSTAT_END(INTEGER(any) I) -> default LOGICAL -IS_IOSTAT_EOR(INTEGER(any) I) -> default LOGICAL -LOG_GAMMA(REAL(k) X) -> REAL(k) -MAX(INTEGER(k) ...) -> INTEGER(k) -MAX(REAL(k) ...) -> REAL(k) -MAX(CHARACTER(KIND=k) ...) -> CHARACTER(KIND=k,LEN=MAX(LEN(...))) -MERGE(any type TSOURCE, same type FSOURCE, LOGICAL(any) MASK) -> type of FSOURCE -MIN(INTEGER(k) ...) -> INTEGER(k) -MIN(REAL(k) ...) -> REAL(k) -MIN(CHARACTER(KIND=k) ...) -> CHARACTER(KIND=k,LEN=MAX(LEN(...))) -MODULO(INTEGER(k) A, INTEGER(k) P) -> INTEGER(k); P*result >= 0 -MODULO(REAL(k) A, REAL(k) P) -> REAL(k) = A - P*FLOOR(A/P) -NEAREST(REAL(k) X, REAL(any) S) -> REAL(k) -OUT_OF_RANGE(INTEGER(any) X, scalar INTEGER or REAL(k) MOLD) -> default LOGICAL -OUT_OF_RANGE(REAL(any) X, scalar REAL(k) MOLD) -> default LOGICAL -OUT_OF_RANGE(REAL(any) X, scalar INTEGER(any) MOLD, scalar LOGICAL(any) ROUND=.FALSE.) -> default LOGICAL -RRSPACING(REAL(k) X) -> REAL(k) -SCALE(REAL(k) X, INTEGER(any) I) -> REAL(k) -SET_EXPONENT(REAL(k) X, INTEGER(any) I) -> REAL(k) -SPACING(REAL(k) X) -> REAL(k) -``` - -### Restricted specific aliases for elemental conversions &/or extrema with default intrinsic types - -``` -AMAX0(INTEGER ...) = REAL(MAX(...)) -AMAX1(REAL ...) = MAX(...) -AMIN0(INTEGER...) = REAL(MIN(...)) -AMIN1(REAL ...) = MIN(...) -DMAX1(DOUBLE PRECISION ...) = MAX(...) -DMIN1(DOUBLE PRECISION ...) = MIN(...) -FLOAT(INTEGER I) = REAL(I) -IDINT(DOUBLE PRECISION A) = INT(A) -IFIX(REAL A) = INT(A) -MAX0(INTEGER ...) = MAX(...) -MAX1(REAL ...) = INT(MAX(...)) -MIN0(INTEGER ...) = MIN(...) -MIN1(REAL ...) = INT(MIN(...)) -SNGL(DOUBLE PRECISION A) = REAL(A) -``` - -### Generic elemental bit manipulation intrinsic functions -Many of these accept a typeless "BOZ" literal as an actual argument. -It is interpreted as having the kind of intrinsic `INTEGER` type -as another argument, as if the typeless were implicitly wrapped -in a call to `INT()`. -When multiple arguments can be either `INTEGER` values or typeless -constants, it is forbidden for *all* of them to be typeless -constants if the result of the function is `INTEGER` -(i.e., only `BGE`, `BGT`, `BLE`, and `BLT` can have multiple -typeless arguments). - -``` -BGE(INTEGER(n1) or BOZ I, INTEGER(n2) or BOZ J) -> default LOGICAL -BGT(INTEGER(n1) or BOZ I, INTEGER(n2) or BOZ J) -> default LOGICAL -BLE(INTEGER(n1) or BOZ I, INTEGER(n2) or BOZ J) -> default LOGICAL -BLT(INTEGER(n1) or BOZ I, INTEGER(n2) or BOZ J) -> default LOGICAL -BTEST(INTEGER(n1) I, INTEGER(n2) POS) -> default LOGICAL -DSHIFTL(INTEGER(k) I, INTEGER(k) or BOZ J, INTEGER(any) SHIFT) -> INTEGER(k) -DSHIFTL(BOZ I, INTEGER(k), INTEGER(any) SHIFT) -> INTEGER(k) -DSHIFTR(INTEGER(k) I, INTEGER(k) or BOZ J, INTEGER(any) SHIFT) -> INTEGER(k) -DSHIFTR(BOZ I, INTEGER(k), INTEGER(any) SHIFT) -> INTEGER(k) -IAND(INTEGER(k) I, INTEGER(k) or BOZ J) -> INTEGER(k) -IAND(BOZ I, INTEGER(k) J) -> INTEGER(k) -IBCLR(INTEGER(k) I, INTEGER(any) POS) -> INTEGER(k) -IBITS(INTEGER(k) I, INTEGER(n1) POS, INTEGER(n2) LEN) -> INTEGER(k) -IBSET(INTEGER(k) I, INTEGER(any) POS) -> INTEGER(k) -IEOR(INTEGER(k) I, INTEGER(k) or BOZ J) -> INTEGER(k) -IEOR(BOZ I, INTEGER(k) J) -> INTEGER(k) -IOR(INTEGER(k) I, INTEGER(k) or BOZ J) -> INTEGER(k) -IOR(BOZ I, INTEGER(k) J) -> INTEGER(k) -ISHFT(INTEGER(k) I, INTEGER(any) SHIFT) -> INTEGER(k) -ISHFTC(INTEGER(k) I, INTEGER(n1) SHIFT, INTEGER(n2) SIZE=BIT_SIZE(I)) -> INTEGER(k) -LEADZ(INTEGER(any) I) -> default INTEGER -MASKL(INTEGER(any) I, KIND=KIND(0)) -> INTEGER(KIND) -MASKR(INTEGER(any) I, KIND=KIND(0)) -> INTEGER(KIND) -MERGE_BITS(INTEGER(k) I, INTEGER(k) or BOZ J, INTEGER(k) or BOZ MASK) = IOR(IAND(I,MASK),IAND(J,NOT(MASK))) -MERGE_BITS(BOZ I, INTEGER(k) J, INTEGER(k) or BOZ MASK) = IOR(IAND(I,MASK),IAND(J,NOT(MASK))) -NOT(INTEGER(k) I) -> INTEGER(k) -POPCNT(INTEGER(any) I) -> default INTEGER -POPPAR(INTEGER(any) I) -> default INTEGER = IAND(POPCNT(I), Z'1') -SHIFTA(INTEGER(k) I, INTEGER(any) SHIFT) -> INTEGER(k) -SHIFTL(INTEGER(k) I, INTEGER(any) SHIFT) -> INTEGER(k) -SHIFTR(INTEGER(k) I, INTEGER(any) SHIFT) -> INTEGER(k) -TRAILZ(INTEGER(any) I) -> default INTEGER -``` - -### Character elemental intrinsic functions -See also `INDEX` and `LEN` above among the elemental intrinsic functions with -unrestricted specific names. -``` -ADJUSTL(CHARACTER(k,LEN=n) STRING) -> CHARACTER(k,LEN=n) -ADJUSTR(CHARACTER(k,LEN=n) STRING) -> CHARACTER(k,LEN=n) -LEN_TRIM(CHARACTER(k,n) STRING, KIND=KIND(0)) -> INTEGER(KIND) = n -LGE(CHARACTER(k,n1) STRING_A, CHARACTER(k,n2) STRING_B) -> default LOGICAL -LGT(CHARACTER(k,n1) STRING_A, CHARACTER(k,n2) STRING_B) -> default LOGICAL -LLE(CHARACTER(k,n1) STRING_A, CHARACTER(k,n2) STRING_B) -> default LOGICAL -LLT(CHARACTER(k,n1) STRING_A, CHARACTER(k,n2) STRING_B) -> default LOGICAL -SCAN(CHARACTER(k,n) STRING, CHARACTER(k,m) SET, LOGICAL(any) BACK=.FALSE., KIND=KIND(0)) -> INTEGER(KIND) -VERIFY(CHARACTER(k,n) STRING, CHARACTER(k,m) SET, LOGICAL(any) BACK=.FALSE., KIND=KIND(0)) -> INTEGER(KIND) -``` - -`SCAN` returns the index of the first (or last, if `BACK=.TRUE.`) character in `STRING` -that is present in `SET`, or zero if none is. - -`VERIFY` is essentially the opposite: it returns the index of the first (or last) character -in `STRING` that is *not* present in `SET`, or zero if all are. - -# Transformational intrinsic functions - -This category comprises a large collection of intrinsic functions that -are collected together because they somehow transform their arguments -in a way that prevents them from being elemental. -All of them are pure, however. - -Some general rules apply to the transformational intrinsic functions: - -1. `DIM` arguments are optional; if present, the actual argument must be - a scalar integer of any kind. -1. When an optional `DIM` argument is absent, or an `ARRAY` or `MASK` - argument is a vector, the result of the function is scalar; otherwise, - the result is an array of the same shape as the `ARRAY` or `MASK` - argument with the dimension `DIM` removed from the shape. -1. When a function takes an optional `MASK` argument, it must be conformable - with its `ARRAY` argument if it is present, and the mask can be any kind - of `LOGICAL`. It can be scalar. -1. The type `numeric` here can be any kind of `INTEGER`, `REAL`, or `COMPLEX`. -1. The type `relational` here can be any kind of `INTEGER`, `REAL`, or `CHARACTER`. -1. The type `any` here denotes any intrinsic or derived type. -1. The notation `(..)` denotes an array of any rank (but not an assumed-rank array). - -## Logical reduction transformational intrinsic functions -``` -ALL(LOGICAL(k) MASK(..) [, DIM ]) -> LOGICAL(k) -ANY(LOGICAL(k) MASK(..) [, DIM ]) -> LOGICAL(k) -COUNT(LOGICAL(any) MASK(..) [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) -PARITY(LOGICAL(k) MASK(..) [, DIM ]) -> LOGICAL(k) -``` - -## Numeric reduction transformational intrinsic functions -``` -IALL(INTEGER(k) ARRAY(..) [, DIM, MASK ]) -> INTEGER(k) -IANY(INTEGER(k) ARRAY(..) [, DIM, MASK ]) -> INTEGER(k) -IPARITY(INTEGER(k) ARRAY(..) [, DIM, MASK ]) -> INTEGER(k) -NORM2(REAL(k) X(..) [, DIM ]) -> REAL(k) -PRODUCT(numeric ARRAY(..) [, DIM, MASK ]) -> numeric -SUM(numeric ARRAY(..) [, DIM, MASK ]) -> numeric -``` - -`NORM2` generalizes `HYPOT` by computing `SQRT(SUM(X*X))` while avoiding spurious overflows. - -## Extrema reduction transformational intrinsic functions -``` -MAXVAL(relational(k) ARRAY(..) [, DIM, MASK ]) -> relational(k) -MINVAL(relational(k) ARRAY(..) [, DIM, MASK ]) -> relational(k) -``` - -### Locational transformational intrinsic functions -When the optional `DIM` argument is absent, the result is an `INTEGER(KIND)` -vector whose length is the rank of `ARRAY`. -When the optional `DIM` argument is present, the result is an `INTEGER(KIND)` -array of rank `RANK(ARRAY)-1` and shape equal to that of `ARRAY` with -the dimension `DIM` removed. - -The optional `BACK` argument is a scalar LOGICAL value of any kind. -When present and `.TRUE.`, it causes the function to return the index -of the *last* occurence of the target or extreme value. - -For `FINDLOC`, `ARRAY` may have any of the five intrinsic types, and `VALUE` -must a scalar value of a type for which `ARRAY==VALUE` or `ARRAY .EQV. VALUE` -is an acceptable expression. - -``` -FINDLOC(intrinsic ARRAY(..), scalar VALUE [, DIM, MASK, KIND=KIND(0), BACK=.FALSE. ]) -MAXLOC(relational ARRAY(..) [, DIM, MASK, KIND=KIND(0), BACK=.FALSE. ]) -MINLOC(relational ARRAY(..) [, DIM, MASK, KIND=KIND(0), BACK=.FALSE. ]) -``` - -## Data rearrangement transformational intrinsic functions -The optional `DIM` argument to these functions must be a scalar integer of -any kind, and it takes a default value of 1 when absent. - -``` -CSHIFT(any ARRAY(..), INTEGER(any) SHIFT(..) [, DIM ]) -> same type/kind/shape as ARRAY -``` -Either `SHIFT` is scalar or `RANK(SHIFT) == RANK(ARRAY) - 1` and `SHAPE(SHIFT)` is that of `SHAPE(ARRAY)` with element `DIM` removed. - -``` -EOSHIFT(any ARRAY(..), INTEGER(any) SHIFT(..) [, BOUNDARY, DIM ]) -> same type/kind/shape as ARRAY -``` -* `SHIFT` is scalar or `RANK(SHIFT) == RANK(ARRAY) - 1` and `SHAPE(SHIFT)` is that of `SHAPE(ARRAY)` with element `DIM` removed. -* If `BOUNDARY` is present, it must have the same type and parameters as `ARRAY`. -* If `BOUNDARY` is absent, `ARRAY` must be of an intrinsic type, and the default `BOUNDARY` is the obvious `0`, `' '`, or `.FALSE.` value of `KIND(ARRAY)`. -* If `BOUNDARY` is present, either it is scalar, or `RANK(BOUNDARY) == RANK(ARRAY) - 1` and `SHAPE(BOUNDARY)` is that of `SHAPE(ARRAY)` with element `DIM` - removed. - -``` -PACK(any ARRAY(..), LOGICAL(any) MASK(..)) -> vector of same type and kind as ARRAY -``` -* `MASK` is conformable with `ARRAY` and may be scalar. -* The length of the result vector is `COUNT(MASK)` if `MASK` is an array, else `SIZE(ARRAY)` if `MASK` is `.TRUE.`, else zero. - -``` -PACK(any ARRAY(..), LOGICAL(any) MASK(..), any VECTOR(n)) -> vector of same type, kind, and size as VECTOR -``` -* `MASK` is conformable with `ARRAY` and may be scalar. -* `VECTOR` has the same type and kind as `ARRAY`. -* `VECTOR` must not be smaller than result of `PACK` with no `VECTOR` argument. -* The leading elements of `VECTOR` are replaced with elements from `ARRAY` as - if `PACK` had been invoked without `VECTOR`. - -``` -RESHAPE(any SOURCE(..), INTEGER(k) SHAPE(n) [, PAD(..), INTEGER(k2) ORDER(n) ]) -> SOURCE array with shape SHAPE -``` -* If `ORDER` is present, it is a vector of the same size as `SHAPE`, and - contains a permutation. -* The element(s) of `PAD` are used to fill out the result once `SOURCE` - has been consumed. - -``` -SPREAD(any SOURCE, DIM, scalar INTEGER(any) NCOPIES) -> same type as SOURCE, rank=RANK(SOURCE)+1 -TRANSFER(any SOURCE, any MOLD) -> scalar if MOLD is scalar, else vector; same type and kind as MOLD -TRANSFER(any SOURCE, any MOLD, scalar INTEGER(any) SIZE) -> vector(SIZE) of type and kind of MOLD -TRANSPOSE(any MATRIX(n,m)) -> matrix(m,n) of same type and kind as MATRIX -``` - -The shape of the result of `SPREAD` is the same as that of `SOURCE`, with `NCOPIES` inserted -at position `DIM`. - -``` -UNPACK(any VECTOR(n), LOGICAL(any) MASK(..), FIELD) -> type and kind of VECTOR, shape of MASK -``` -`FIELD` has same type and kind as `VECTOR` and is conformable with `MASK`. - -## Other transformational intrinsic functions -``` -BESSEL_JN(INTEGER(n1) N1, INTEGER(n2) N2, REAL(k) X) -> REAL(k) vector (MAX(N2-N1+1,0)) -BESSEL_YN(INTEGER(n1) N1, INTEGER(n2) N2, REAL(k) X) -> REAL(k) vector (MAX(N2-N1+1,0)) -COMMAND_ARGUMENT_COUNT() -> scalar default INTEGER -DOT_PRODUCT(LOGICAL(k) VECTOR_A(n), LOGICAL(k) VECTOR_B(n)) -> LOGICAL(k) = ANY(VECTOR_A .AND. VECTOR_B) -DOT_PRODUCT(COMPLEX(any) VECTOR_A(n), numeric VECTOR_B(n)) = SUM(CONJG(VECTOR_A) * VECTOR_B) -DOT_PRODUCT(INTEGER(any) or REAL(any) VECTOR_A(n), numeric VECTOR_B(n)) = SUM(VECTOR_A * VECTOR_B) -MATMUL(numeric ARRAY_A(j), numeric ARRAY_B(j,k)) -> numeric vector(k) -MATMUL(numeric ARRAY_A(j,k), numeric ARRAY_B(k)) -> numeric vector(j) -MATMUL(numeric ARRAY_A(j,k), numeric ARRAY_B(k,m)) -> numeric matrix(j,m) -MATMUL(LOGICAL(n1) ARRAY_A(j), LOGICAL(n2) ARRAY_B(j,k)) -> LOGICAL vector(k) -MATMUL(LOGICAL(n1) ARRAY_A(j,k), LOGICAL(n2) ARRAY_B(k)) -> LOGICAL vector(j) -MATMUL(LOGICAL(n1) ARRAY_A(j,k), LOGICAL(n2) ARRAY_B(k,m)) -> LOGICAL matrix(j,m) -NULL([POINTER/ALLOCATABLE MOLD]) -> POINTER -REDUCE(any ARRAY(..), function OPERATION [, DIM, LOGICAL(any) MASK(..), IDENTITY, LOGICAL ORDERED=.FALSE. ]) -REPEAT(CHARACTER(k,n) STRING, INTEGER(any) NCOPIES) -> CHARACTER(k,n*NCOPIES) -SELECTED_CHAR_KIND('DEFAULT' or 'ASCII' or 'ISO_10646' or ...) -> scalar default INTEGER -SELECTED_INT_KIND(scalar INTEGER(any) R) -> scalar default INTEGER -SELECTED_REAL_KIND([scalar INTEGER(any) P, scalar INTEGER(any) R, scalar INTEGER(any) RADIX]) -> scalar default INTEGER -SHAPE(SOURCE, KIND=KIND(0)) -> INTEGER(KIND)(RANK(SOURCE)) -TRIM(CHARACTER(k,n) STRING) -> CHARACTER(k) -``` - -The type and kind of the result of a numeric `MATMUL` is the same as would result from -a multiplication of an element of ARRAY_A and an element of ARRAY_B. - -The kind of the `LOGICAL` result of a `LOGICAL` `MATMUL` is the same as would result -from an intrinsic `.AND.` operation between an element of `ARRAY_A` and an element -of `ARRAY_B`. - -Note that `DOT_PRODUCT` with a `COMPLEX` first argument operates on its complex conjugate, -but that `MATMUL` with a `COMPLEX` argument does not. - -The `MOLD` argument to `NULL` may be omitted only in a context where the type of the pointer is known, -such as an initializer or pointer assignment statement. - -At least one argument must be present in a call to `SELECTED_REAL_KIND`. - -An assumed-rank array may be passed to `SHAPE`, and if it is associated with an assumed-size array, -the last element of the result will be -1. - -## Coarray transformational intrinsic functions -``` -FAILED_IMAGES([scalar TEAM_TYPE TEAM, KIND=KIND(0)]) -> INTEGER(KIND) vector -GET_TEAM([scalar INTEGER(?) LEVEL]) -> scalar TEAM_TYPE -IMAGE_INDEX(COARRAY, INTEGER(any) SUB(n) [, scalar TEAM_TYPE TEAM ]) -> scalar default INTEGER -IMAGE_INDEX(COARRAY, INTEGER(any) SUB(n), scalar INTEGER(any) TEAM_NUMBER) -> scalar default INTEGER -NUM_IMAGES([scalar TEAM_TYPE TEAM]) -> scalar default INTEGER -NUM_IMAGES(scalar INTEGER(any) TEAM_NUMBER) -> scalar default INTEGER -STOPPED_IMAGES([scalar TEAM_TYPE TEAM, KIND=KIND(0)]) -> INTEGER(KIND) vector -TEAM_NUMBER([scalar TEAM_TYPE TEAM]) -> scalar default INTEGER -THIS_IMAGE([COARRAY, DIM, scalar TEAM_TYPE TEAM]) -> default INTEGER -``` -The result of `THIS_IMAGE` is a scalar if `DIM` is present or if `COARRAY` is absent, -and a vector whose length is the corank of `COARRAY` otherwise. - -# Inquiry intrinsic functions -These are neither elemental nor transformational; all are pure. - -## Type inquiry intrinsic functions -All of these functions return constants. -The value of the argument is not used, and may well be undefined. -``` -BIT_SIZE(INTEGER(k) I(..)) -> INTEGER(k) -DIGITS(INTEGER or REAL X(..)) -> scalar default INTEGER -EPSILON(REAL(k) X(..)) -> scalar REAL(k) -HUGE(INTEGER(k) X(..)) -> scalar INTEGER(k) -HUGE(REAL(k) X(..)) -> scalar of REAL(k) -KIND(intrinsic X(..)) -> scalar default INTEGER -MAXEXPONENT(REAL(k) X(..)) -> scalar default INTEGER -MINEXPONENT(REAL(k) X(..)) -> scalar default INTEGER -NEW_LINE(CHARACTER(k,n) A(..)) -> scalar CHARACTER(k,1) = CHAR(10) -PRECISION(REAL(k) or COMPLEX(k) X(..)) -> scalar default INTEGER -RADIX(INTEGER(k) or REAL(k) X(..)) -> scalar default INTEGER, always 2 -RANGE(INTEGER(k) or REAL(k) or COMPLEX(k) X(..)) -> scalar default INTEGER -TINY(REAL(k) X(..)) -> scalar REAL(k) -``` - -## Bound and size inquiry intrinsic functions -The results are scalar when `DIM` is present, and a vector of length=(co)rank(`(CO)ARRAY`) -when `DIM` is absent. -``` -LBOUND(any ARRAY(..) [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) -LCOBOUND(any COARRAY [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) -SIZE(any ARRAY(..) [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) -UBOUND(any ARRAY(..) [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) -UCOBOUND(any COARRAY [, DIM, KIND=KIND(0) ]) -> INTEGER(KIND) -``` - -Assumed-rank arrays may be used with `LBOUND`, `SIZE`, and `UBOUND`. - -## Object characteristic inquiry intrinsic functions -``` -ALLOCATED(any type ALLOCATABLE ARRAY) -> scalar default LOGICAL -ALLOCATED(any type ALLOCATABLE SCALAR) -> scalar default LOGICAL -ASSOCIATED(any type POINTER POINTER [, same type TARGET]) -> scalar default LOGICAL -COSHAPE(COARRAY, KIND=KIND(0)) -> INTEGER(KIND) vector of length corank(COARRAY) -EXTENDS_TYPE_OF(A, MOLD) -> default LOGICAL -IS_CONTIGUOUS(any data ARRAY(..)) -> scalar default LOGICAL -PRESENT(OPTIONAL A) -> scalar default LOGICAL -RANK(any data A) -> scalar default INTEGER = 0 if A is scalar, SIZE(SHAPE(A)) if A is an array, rank if assumed-rank -SAME_TYPE_AS(A, B) -> scalar default LOGICAL -STORAGE_SIZE(any data A, KIND=KIND(0)) -> INTEGER(KIND) -``` -The arguments to `EXTENDS_TYPE_OF` must be of extensible derived types or be unlimited polymorphic. - -An assumed-rank array may be used with `IS_CONTIGUOUS` and `RANK`. - -# Intrinsic subroutines - -(*TODO*: complete these descriptions) - -## One elemental intrinsic subroutine -``` -INTERFACE - SUBROUTINE MVBITS(FROM, FROMPOS, LEN, TO, TOPOS) - INTEGER(k1) :: FROM, TO - INTENT(IN) :: FROM - INTENT(INOUT) :: TO - INTEGER(k2), INTENT(IN) :: FROMPOS - INTEGER(k3), INTENT(IN) :: LEN - INTEGER(k4), INTENT(IN) :: TOPOS - END SUBROUTINE -END INTERFACE -``` - -## Non-elemental intrinsic subroutines -``` -CALL CPU_TIME(REAL INTENT(OUT) TIME) -``` -The kind of `TIME` is not specified in the standard. - -``` -CALL DATE_AND_TIME([DATE, TIME, ZONE, VALUES]) -``` -* All arguments are `OPTIONAL` and `INTENT(OUT)`. -* `DATE`, `TIME`, and `ZONE` are scalar default `CHARACTER`. -* `VALUES` is a vector of at least 8 elements of `INTEGER(KIND >= 2)`. -``` -CALL EVENT_QUERY(EVENT, COUNT [, STAT]) -CALL EXECUTE_COMMAND_LINE(COMMAND [, WAIT, EXITSTAT, CMDSTAT, CMDMSG ]) -CALL GET_COMMAND([COMMAND, LENGTH, STATUS, ERRMSG ]) -CALL GET_COMMAND_ARGUMENT(NUMBER [, VALUE, LENGTH, STATUS, ERRMSG ]) -CALL GET_ENVIRONMENT_VARIABLE(NAME [, VALUE, LENGTH, STATUS, TRIM_NAME, ERRMSG ]) -CALL MOVE_ALLOC(ALLOCATABLE INTENT(INOUT) FROM, ALLOCATABLE INTENT(OUT) TO [, STAT, ERRMSG ]) -CALL RANDOM_INIT(LOGICAL(k1) INTENT(IN) REPEATABLE, LOGICAL(k2) INTENT(IN) IMAGE_DISTINCT) -CALL RANDOM_NUMBER(REAL(k) INTENT(OUT) HARVEST(..)) -CALL RANDOM_SEED([SIZE, PUT, GET]) -CALL SYSTEM_CLOCK([COUNT, COUNT_RATE, COUNT_MAX]) -``` - -## Atomic intrinsic subroutines -``` -CALL ATOMIC_ADD(ATOM, VALUE [, STAT=]) -CALL ATOMIC_AND(ATOM, VALUE [, STAT=]) -CALL ATOMIC_CAS(ATOM, OLD, COMPARE, NEW [, STAT=]) -CALL ATOMIC_DEFINE(ATOM, VALUE [, STAT=]) -CALL ATOMIC_FETCH_ADD(ATOM, VALUE, OLD [, STAT=]) -CALL ATOMIC_FETCH_AND(ATOM, VALUE, OLD [, STAT=]) -CALL ATOMIC_FETCH_OR(ATOM, VALUE, OLD [, STAT=]) -CALL ATOMIC_FETCH_XOR(ATOM, VALUE, OLD [, STAT=]) -CALL ATOMIC_OR(ATOM, VALUE [, STAT=]) -CALL ATOMIC_REF(VALUE, ATOM [, STAT=]) -CALL ATOMIC_XOR(ATOM, VALUE [, STAT=]) -``` - -## Collective intrinsic subroutines -``` -CALL CO_BROADCAST -CALL CO_MAX -CALL CO_MIN -CALL CO_REDUCE -CALL CO_SUM -``` - -# Non-standard intrinsics -## PGI -``` -AND, OR, XOR -LSHIFT, RSHIFT, SHIFT -ZEXT, IZEXT -COSD, SIND, TAND, ACOSD, ASIND, ATAND, ATAN2D -COMPL -DCMPLX -EQV, NEQV -INT8 -JINT, JNINT, KNINT -LOC -``` - -## Intel -``` -DCMPLX(X,Y), QCMPLX(X,Y) -DREAL(DOUBLE COMPLEX A) -> DOUBLE PRECISION -DFLOAT, DREAL -QEXT, QFLOAT, QREAL -DNUM, INUM, JNUM, KNUM, QNUM, RNUM - scan value from string -ZEXT -RAN, RANF -ILEN(I) = BIT_SIZE(I) -SIZEOF -MCLOCK, SECNDS -COTAN(X) = 1.0/TAN(X) -COSD, SIND, TAND, ACOSD, ASIND, ATAND, ATAN2D, COTAND - degrees -AND, OR, XOR -LSHIFT, RSHIFT -IBCHNG, ISHA, ISHC, ISHL, IXOR -IARG, IARGC, NARGS, NUMARG -BADDRESS, IADDR -CACHESIZE, EOF, FP_CLASS, INT_PTR_KIND, ISNAN, LOC -MALLOC -``` - -# Intrinsic Procedure Support in f18 -This section gives an overview of the support inside f18 libraries for the -intrinsic procedures listed above. -It may be outdated, refer to f18 code base for the actual support status. - -## Semantic Analysis -F18 semantic expression analysis phase detects intrinsic procedure references, -validates the argument types and deduces the return types. -This phase currently supports all the intrinsic procedures listed above but the ones in the table below. - -| Intrinsic Category | Intrinsic Procedures Lacking Support | -| --- | --- | -| Coarray intrinsic functions | LCOBOUND, UCOBOUND, FAILED_IMAGES, GET_TEAM, IMAGE_INDEX, STOPPED_IMAGES, TEAM_NUMBER, THIS_IMAGE, COSHAPE | -| Object characteristic inquiry functions | ALLOCATED, ASSOCIATED, EXTENDS_TYPE_OF, IS_CONTIGUOUS, PRESENT, RANK, SAME_TYPE, STORAGE_SIZE | -| Type inquiry intrinsic functions | BIT_SIZE, DIGITS, EPSILON, HUGE, KIND, MAXEXPONENT, MINEXPONENT, NEW_LINE, PRECISION, RADIX, RANGE, TINY| -| Non-standard intrinsic functions | AND, OR, XOR, LSHIFT, RSHIFT, SHIFT, ZEXT, IZEXT, COSD, SIND, TAND, ACOSD, ASIND, ATAND, ATAN2D, COMPL, DCMPLX, EQV, NEQV, INT8, JINT, JNINT, KNINT, LOC, QCMPLX, DREAL, DFLOAT, QEXT, QFLOAT, QREAL, DNUM, NUM, JNUM, KNUM, QNUM, RNUM, RAN, RANF, ILEN, SIZEOF, MCLOCK, SECNDS, COTAN, IBCHNG, ISHA, ISHC, ISHL, IXOR, IARG, IARGC, NARGS, NUMARG, BADDRESS, IADDR, CACHESIZE, EOF, FP_CLASS, INT_PTR_KIND, ISNAN, MALLOC | -| Intrinsic subroutines |MVBITS (elemental), CPU_TIME, DATE_AND_TIME, EVENT_QUERY, EXECUTE_COMMAND_LINE, GET_COMMAND, GET_COMMAND_ARGUMENT, GET_ENVIRONMENT_VARIABLE, MOVE_ALLOC, RANDOM_INIT, RANDOM_NUMBER, RANDOM_SEED, SYSTEM_CLOCK | -| Atomic intrinsic subroutines | ATOMIC_ADD &al. | -| Collective intrinsic subroutines | CO_BROADCAST &al. | - - -## Intrinsic Function Folding -Fortran Constant Expressions can contain references to a certain number of -intrinsic functions (see Fortran 2018 standard section 10.1.12 for more details). -Constant Expressions may be used to define kind arguments. Therefore, the semantic -expression analysis phase must be able to fold references to intrinsic functions -listed in section 10.1.12. - -F18 intrinsic function folding is either performed by implementations directly -operating on f18 scalar types or by using host runtime functions and -host hardware types. F18 supports folding elemental intrinsic functions over -arrays when an implementation is provided for the scalars (regardless of whether -it is using host hardware types or not). -The status of intrinsic function folding support is given in the sub-sections below. - -### Intrinsic Functions with Host Independent Folding Support -Implementations using f18 scalar types enables folding intrinsic functions -on any host and with any possible type kind supported by f18. The intrinsic functions -listed below are folded using host independent implementations. - -| Return Type | Intrinsic Functions with Host Independent Folding Support| -| --- | --- | -| INTEGER| ABS(INTEGER(k)), DIM(INTEGER(k), INTEGER(k)), DSHIFTL, DSHIFTR, IAND, IBCLR, IBSET, IEOR, INT, IOR, ISHFT, KIND, LEN, LEADZ, MASKL, MASKR, MERGE_BITS, POPCNT, POPPAR, SHIFTA, SHIFTL, SHIFTR, TRAILZ | -| REAL | ABS(REAL(k)), ABS(COMPLEX(k)), AIMAG, AINT, DPROD, REAL | -| COMPLEX | CMPLX, CONJG | -| LOGICAL | BGE, BGT, BLE, BLT | - -### Intrinsic Functions with Host Dependent Folding Support -Implementations using the host runtime may not be available for all supported -f18 types depending on the host hardware types and the libraries available on the host. -The actual support on a host depends on what the host hardware types are. -The list below gives the functions that are folded using host runtime and the related C/C++ types. -F18 automatically detects if these types match an f18 scalar type. If so, -folding of the intrinsic functions will be possible for the related f18 scalar type, -otherwise an error message will be produced by f18 when attempting to fold related intrinsic functions. - -| C/C++ Host Type | Intrinsic Functions with Host Standard C++ Library Based Folding Support | -| --- | --- | -| float, double and long double | ACOS, ACOSH, ASINH, ATAN, ATAN2, ATANH, COS, COSH, ERF, ERFC, EXP, GAMMA, HYPOT, LOG, LOG10, LOG_GAMMA, MOD, SIN, SQRT, SINH, SQRT, TAN, TANH | -| std::complex for float, double and long double| ACOS, ACOSH, ASIN, ASINH, ATAN, ATANH, COS, COSH, EXP, LOG, SIN, SINH, SQRT, TAN, TANH | - -On top of the default usage of C++ standard library functions for folding described -in the table above, it is possible to compile f18 evaluate library with -[libpgmath](https://github.com/flang-compiler/flang/tree/master/runtime/libpgmath) -so that it can be used for folding. To do so, one must have a compiled version -of the libpgmath library available on the host and add -`-DLIBPGMATH_DIR=` to the f18 cmake command. - -Libpgmath comes with real and complex functions that replace C++ standard library -float and double functions to fold all the intrinsic functions listed in the table above. -It has no long double versions. If the host long double matches an f18 scalar type, -C++ standard library functions will still be used for folding expressions with this scalar type. -Libpgmath adds the possibility to fold the following functions for f18 real scalar -types related to host float and double types. - -| C/C++ Host Type | Additional Intrinsic Function Folding Support with Libpgmath (Optional) | -| --- | --- | -|float and double| BESSEL_J0, BESSEL_J1, BESSEL_JN (elemental only), BESSEL_Y0, BESSEL_Y1, BESSEL_Yn (elemental only), ERFC_SCALED | - -Libpgmath comes in three variants (precise, relaxed and fast). So far, only the -precise version is used for intrinsic function folding in f18. It guarantees the greatest numerical precision. - -### Intrinsic Functions with Missing Folding Support -The following intrinsic functions are allowed in constant expressions but f18 -is not yet able to fold them. Note that there might be constraints on the arguments -so that these intrinsics can be used in constant expressions (see section 10.1.12 of Fortran 2018 standard). - -ALL, ACHAR, ADJUSTL, ADJUSTR, ANINT, ANY, BESSEL_JN (transformational only), -BESSEL_YN (transformational only), BTEST, CEILING, CHAR, COUNT, CSHIFT, DOT_PRODUCT, -DIM (REAL only), DOT_PRODUCT, EOSHIFT, FINDLOC, FLOOR, FRACTION, HUGE, IACHAR, IALL, -IANY, IPARITY, IBITS, ICHAR, IMAGE_STATUS, INDEX, ISHFTC, IS_IOSTAT_END, -IS_IOSTAT_EOR, LBOUND, LEN_TRIM, LGE, LGT, LLE, LLT, LOGICAL, MATMUL, MAX, MAXLOC, -MAXVAL, MERGE, MIN, MINLOC, MINVAL, MOD (INTEGER only), MODULO, NEAREST, NINT, -NORM2, NOT, OUT_OF_RANGE, PACK, PARITY, PRODUCT, REPEAT, REDUCE, RESHAPE, -RRSPACING, SCAN, SCALE, SELECTED_CHAR_KIND, SELECTED_INT_KIND, SELECTED_REAL_KIND, -SET_EXPONENT, SHAPE, SIGN, SIZE, SPACING, SPREAD, SUM, TINY, TRANSFER, TRANSPOSE, -TRIM, UBOUND, UNPACK, VERIFY. - -Coarray, non standard, IEEE and ISO_C_BINDINGS intrinsic functions that can be -used in constant expressions have currently no folding support at all. diff --git a/flang/documentation/LabelResolution.md b/flang/documentation/LabelResolution.md deleted file mode 100644 --- a/flang/documentation/LabelResolution.md +++ /dev/null @@ -1,288 +0,0 @@ - - -# Semantics: Resolving Labels and Construct Names - -## Overview - -After the Fortran input file(s) has been parsed into a syntax tree, the compiler must check that the program checks semantically. Target labels must be checked and violations of legal semantics should be reported to the user. - -This is the detailed design document on how these labels will be semantically checked. Legal semantics may result in rewrite operations on the syntax tree. Semantics violations will be reported as errors to the user. - -## Requirements - -- Input: a parse tree that decomposes the Fortran program unit -- Output: - * **Success** returns true - (Additionally, the parse tree may be rewritten on success to capture the nested DO loop structure explicitly from any _label-do-stmt_ type loops.) - * **Failure** returns false, instantiates (a container of) error message(s) to indicate the problem(s) - - -### Label generalities (6.2.5) - -Enforcement of the general label constraints. There are three sorts of label usage. Labels can serve - 1. as a _label-do-stmt_ block range marker - 1. as branching (control flow) targets - 1. as specification annotations (`FORMAT` statements) for data transfer statements (I/O constructs) - -Labels are related to the standard definition of inclusive scope. For example, control-flow arcs are not allowed to originate from one inclusive scope and target statements outside of that inclusive scope. - -Inclusive scope is defined as a tree structure of nested scoping constructs. A statement, _s_, is said to be *in* the same inclusive scope as another statement, _t_, if and only if _s_ and _t_ are in the same scope or _t_ is in one of the enclosing scopes of _s_, otherwise _s_ is *not in* the same inclusive scope as _t_. (Inclusive scope is unidirectional and is always from innermost scopes to outermost scopes.) - -#### Semantic Checks - -- labels range from 1 to 99999, inclusive (6.2.5 note 2) - * handled automatically by the parser, but add a range check -- labels must be pairwise distinct within their program unit scope (6.2.5 para 2) - * if redundant labels appear → error redundant labels - * the total number of unique statement labels may have a limit - - -### Labels Used for `DO` Loop Ranging - -#### _label-do-stmt_ (R1121) - -A _label-do-stmt_ is a control construct that results in the iterative execution of a number of statements. A _label-do-stmt_ has a (possibly shared, _nonblock-do-construct_) _label_ that will be called the loop target label. The statements to be executed will be the range from the _label-do-stmt_ to the statement identified by the loop target label, inclusive. This range of statements will be called the loop's body and logically forms a _do-block_. - -A _label-do-stmt_ is quite similar to a _block-do-construct_ in semantics, but the parse tree is different in that the parser does not impose a _do-block_ structure on the loop body. - -In F18, the nonblock `DO` construct has been removed. For legacy support (through F08), we will need to handle nonblock `DO` constructs. In F18, the following legacy code is an error. - -```fortran - DO 100 I = 1, 100 - DO 100 J = 1, 100 - ... - 100 CONTINUE -``` - -##### Semantic Checks - -- the loop body target label must exist in the scope (F18:C1133; F08:C815, C817, C819) - * if the label does not appear, error of missing label -- the loop body target label must be, lexically, after the _label-do-stmt_ (R1119) - * if the label appears lexically preceding the `DO`, error of malformed `DO` -- control cannot transfer into the body from outside the _do-block_ - * Exceptions (errors demoted to warnings) - - some implementations relax enforcement of this and allow `GOTO`s from the loop body to "extended ranges" and back again (PGI & gfortan appear to allow, NAG & Intel do not.) - - should some form of "extended ranges" for _do-constructs_ be supported, it should still be limited and not include parallel loops such as `DO CONCURRENT` or loops annotated with OpenACC or OpenMP directives. - * `GOTO`s into the `DO`s inclusive scope, error/warn of invalid transfer of control -- requires that the loop terminating statement for a _label-do-stmt_ be either an `END DO` or a `CONTINUE` - * Exception - - earlier standards allowed other statements to be terminators - -Semantics for F08 and earlier that support sharing the loop terminating statement in a _nonblock-do-construct_ between multiple loops -- some statements cannot be _do-term-action-stmt_ (F08:C816) - * a _do-term-action-stmt_ is an _action-stmt_ but does not include _arithmetic-if-stmt_, _continue-stmt_, _cycle-stmt_, _end-function-stmt_, _end-mp-subprogram-stmt_, _end-program-stmt_, _end-subroutine-stmt_, _error-stop-stmt_, _exit-stmt_, _goto-stmt_, _return-stmt_, or _stop-stmt_ - - if the term action statement is forbidden, error invalid statement in `DO` loop term position -- some statements cannot be _do-term-shared-stmt_ (F08:C818) - * this is the case as in our above example where two different nested loops share the same terminating statement (`100 continue`) - * a _do-term-shared-stmt_ is an _action-stmt_ with all the same exclusions as a _do-term-action-stmt_ except a _continue-stmt_ **is** allowed - - if the term shared action statement is forbidden, error invalid statement in term position - -If the `DO` loop is a `DO CONCURRENT` construct, there are additional constraints (11.1.7.5). -- a _return-stmt_ is not allowed (C1136) -- image control statements are not allowed (C1137) -- branches must be from a statement and to a statement that both reside within the `DO CONCURRENT` (C1138) -- impure procedures shall not be called (C1139) -- deallocation of polymorphic objects is not allowed (C1140) -- references to `IEEE_GET_FLAG`, `IEEE_SET_HALTING_MODE`, and `IEEE_GET_HALTING_MODE` cannot appear in the body of a `DO CONCURRENT` (C1141) -- the use of the `ADVANCE=` specifier by an I/O statement in the body of a `DO CONCURRENT` is not allowed (11.1.7.5, para 5) - -### Labels Used in Branching - -#### _goto-stmt_ (11.2.2, R1157) - -A `GOTO` statement is a simple, direct transfer of control from the `GOTO` to the labelled statement. - -##### Semantic Checks - -- the labelled statement that is the target of a `GOTO` (11.2.1 constraints) - - must refer to a label that is in inclusive scope of the computed `GOTO` statement (C1169) - * if a label does not exist, error nonexistent label - * if a label is out of scope, error out of inclusive scope - - the branch target statement must be valid - * if the statement is not allowed as a branch target, error not a valid branch target -- the labelled statement must be a branch target statement - * a branch target statement is any of _action-stmt_, _associate-stmt_, _end-associate-stmt_, _if-then-stmt_, _end-if-stmt_, _select-case-stmt_, _end-select-stmt_, _select-rank-stmt_, _end-select-rank-stmt_, _select-type-stmt_, _end-select-type-stmt_, _do-stmt_, _end-do-stmt_, _block-stmt_, _end-block-stmt_, _critical-stmt_, _end-critical-stmt_, _forall-construct-stmt_, _forall-stmt_, _where-construct-stmt_, _end-function-stmt_, _end-mp-subprogram-stmt_, _end-program-stmt_, or _end-subroutine-stmt_. (11.2.1) - * Some deleted features that were _action-stmt_ in older standards include _arithmetic-if-stmt_, _assign-stmt_, _assigned-goto-stmt_, and _pause-stmt_. For legacy mode support, these statements should be considered _action-stmt_. - - -#### _computed-goto-stmt_ (11.2.3, R1158) - -The computed `GOTO` statement is analogous to a `switch` statement in C++. - -```fortran - GOTO ( label-list ) [,] scalar-int-expr -``` - -##### Semantics Checks - -- each label in _label-list_ (11.2.1 constraints, same as `GOTO`) - - must refer to a label that is in inclusive scope of the computed `GOTO` statement (C1170) - * if a label does not exist, error nonexistent label - * if a label is out of scope, error out of inclusive scope - - the branch target statement must be valid - * if the statement is not allowed as a branch target, error not a valid branch target -- the _scalar-int-expr_ needs to have `INTEGER` type - * check the type of the expression (type checking done elsewhere) - - -#### R853 _arithmetic-if-stmt_ (F08:8.2.4) - -This control-flow construct is deleted in F18. - -```fortran - IF (scalar-numeric-expr) label1,label2,label3 -``` - -The arithmetic if statement is like a three-way branch operator. If the scalar numeric expression is less than zero goto _label-1_, else if the variable is equal to zero goto _label-2_, else if the variable is greater than zero goto _label-3_. - -##### Semantics Checks - -- the labels in the _arithmetic-if-stmt_ triple must all be present in the inclusive scope (F08:C848) - * if a label does not exist, error nonexistent label - * if a label is out of scope, error out of inclusive scope -- the _scalar-numeric-expr_ must not be `COMPLEX` (F08:C849) - * check the type of the expression (type checking done elsewhere) - - -#### _alt-return-spec_ (15.5.1, R1525) - -These are a Fortran control-flow construct for combining a return from a subroutine with a branch to a labelled statement in the calling routine all in one operation. A typical implementation is for the subroutine to return a hidden integer, which is used as a key in the calling code to then, possibly, branch to a labelled statement in inclusive scope. - -The labels are passed by the calling routine. We want to check those labels at the call-site, that is instances of _alt-return-spec_. - -##### Semantics Checks - -- each _alt-return-spec_ (11.2.1 constraints, same as `GOTO`) - - must refer to a label that is in inclusive scope of the `CALL` statement - * if a label does not exist, error nonexistent label - * if a label is out of scope, error out of inclusive scope - - the branch target statement must be valid - * if the statement is not allowed as a branch target, error not a valid branch target - - -#### **END**, **EOR**, **ERR** specifiers (12.11) - -These specifiers can appear in I/O statements and can transfer control to specific labelled statements under exceptional conditions like end-of-file, end-of-record, and other error conditions. (The PGI compiler adds code to test the results from the runtime routines to determine if these branches should take place.) - -##### Semantics Checks - -- each END, EOR, and ERR specifier (11.2.1 constraints, same as `GOTO`) - - must refer to a label that is in inclusive scope of the I/O statement - * if a label does not exist, error nonexistent label - * if a label is out of scope, error out of inclusive scope - - the branch target statement must be valid - * if the statement is not allowed as a branch target, error not a valid branch target - -#### _assigned-goto-stmt_ and _assign-stmt_ (F90:8.2.4) - -Deleted feature since Fortran 95. - -The _assigned-goto-stmt_ and _assign-stmt_ were _action-stmt_ in the Fortran 90 standard. They are included here for completeness. This pair of obsolete statements can (will) be enabled as part of the compiler's legacy Fortran support. - -The _assign-stmt_ stores a _label_ in an integer variable. The _assigned-goto-stmt_ will then transfer control to the _label_ stored in that integer variable. - -```fortran - ASSIGN 10 TO i - ... - GOTO i (10,20,30) -``` - -##### Semantic Checks - -- an _assigned-goto-stmt_ cannot be a _do-term-action-stmt_ (F90:R829) -- an _assigned-goto-stmt_ cannot be a _do-term-shared-stmt_ (F90:R833) -- constraints from (F90:R839) - - each _label_ in an optional _label-list_ must be the statement label of a branch target statement that appears in the same scoping unit as the _assigned-goto-stmt_ - - _scalar-int-variable_ (`i` in the example above) must be named and of type default integer - - an integer variable that has been assigned a label may only be referenced in an _assigned-goto_ or as a format specifier in an I/O statement - - when an I/O statement with a _format-specifier_ that is an integer variable is executed or when an _assigned-goto_ is executed, the variable must have been assigned a _label_ - - an integer variable can only be assigned a label via the `ASSIGN` statement - - the label assigned to the variable must be in the same scoping unit as the _assigned-goto_ that branches to the _label_ value - - if the parameterized list of labels is present, the label value assigned to the integer variable must appear in that _label-list_ - - a distinct _label_ can appear more than once in the _label-list_ - -Some interpretation is needed as the terms of the older standard are different. - -A "scoping unit" is defined as - - a derived-type definition - - a procedure interface body, excluding derived-types and interfaces contained within it - - a program unit or subprogram, excluding derived-types, interfaces, and subprograms contained within it - -This is a more lax definition of scope than inclusive scope. - -A _named variable_ distinguishes a variable such as, `i`, from an element of an array, `a(i)`, for example. - -### Labels used in I/O - -#### Data transfer statements - -In data transfer (I/O) statements (e.g., `READ`), the user can specify a `FMT=` specifier that can take a label as its argument. (R1215) - -##### Semantic Checks - -- if the `FMT=` specifier has a label as its argument (C1230) - - the label must correspond to a `FORMAT` statement - * if the statement is not a `FORMAT`, error statement must be a `FORMAT` - - the labelled `FORMAT` statement must be in the same inclusive scope as the originating data transfer statement (also in 2008) - * if the label statement does not exist, error label does not exist - * if the label statement is not in scope, error label is not in inclusive scope - - Exceptions (errors demoted to warnings) - - PGI extension: referenced `FORMAT` statements may appear in a host procedure - - Possible relaxation: the scope of the referenced `FORMAT` statement may be ignored, allowing a `FORMAT` to be referenced from any scope in the compilation. - -### Construct Name generalities - -Various Fortran constructs can have names. These include - - the `WHERE` construct (10.2.3) - - the `FORALL` construct (10.2.4) - - the `ASSOCIATE` construct (11.1.3) - - the `BLOCK` construct (11.1.4) - - the `CHANGE TEAM` construct (11.1.5) - - the `CRITICAL` construct (11.1.6) - - the `DO` construct (11.1.7) - - the `IF` construct (11.1.8) - - the `SELECT CASE` construct (11.1.9) - - the `SELECT RANK` construct (11.1.10) - - the `SELECT TYPE` construct (11.1.11) - -#### Semantics Checks - -A construct name is a name formed under 6.2.2. A name is an identifier. Identifiers are parsed by the parser. - - the maximum length of a name is 63 characters (C601) - -Names must either not be given for the construct or used throughout when specified. -- if a construct is given a name, the construct's `END` statement must also specify the same name (`WHERE` C1033, `FORALL` C1035, ...) -- `WHERE` has additional `ELSEWHERE` clauses -- `IF` has additional `ELSE IF` and `ELSE` clauses -- `SELECT CASE` has additional `CASE` clauses -- `SELECT RANK` has additional `RANK` clauses -- `SELECT TYPE` has additional _type-guard-stmt_ -These additional statements must meet the same constraint as the `END` of the construct. Names must match, if present, or there must be no names for any of the clauses. - -### `CYCLE` statement (11.1.7.4.4) - -The `CYCLE` statement takes an optional _do-construct-name_. - -#### Semantics Checks - -- if the `CYCLE` has a _construct-name_, then the `CYCLE` statement must appear within that named _do-construct_ (C1134) -- if the `CYCLE` does not have a _do-construct-name_, the `CYCLE` statement must appear within a _do-construct_ (C1134) - -### `EXIT` statement (11.1.12) - -The `EXIT` statement takes an optional _construct-name_. - -#### Semantics Checks - -- if the `EXIT` has a _construct-name_, then the `EXIT` statement must appear within that named construct (C1166) -- if the `EXIT` does not have a _construct-name_, the `EXIT` statement must appear within a _do-construct_ (C1166) -- an _exit-stmt_ must not appear in a `DO CONCURRENT` if the `EXIT` belongs to the `DO CONCURRENT` or an outer construct enclosing the `DO CONCURRENT` (C1167) -- an _exit-stmt_ must not appear in a `CHANGE TEAM` (`CRITICAL`) if the `EXIT` belongs to an outer construct enclosing the `CHANGE TEAM` (`CRITICAL`) (C1168) - diff --git a/flang/documentation/OpenMP-semantics.md b/flang/documentation/OpenMP-semantics.md deleted file mode 100644 --- a/flang/documentation/OpenMP-semantics.md +++ /dev/null @@ -1,670 +0,0 @@ - - -# OpenMP Semantic Analysis - -## OpenMP for F18 - -1. Define and document the parse tree representation for - * Directives (listed below) - * Clauses (listed below) - * Documentation -1. All the directives and clauses need source provenance for messages -1. Define and document how an OpenMP directive in the parse tree -will be represented as the parent of the statement(s) -to which the directive applies. -The parser itself will not be able to construct this representation; -there will be subsequent passes that do so -just like for example _do-stmt_ and _do-construct_. -1. Define and document the symbol table extensions -1. Define and document the module file extensions - - -### Directives - -OpenMP divides directives into three categories as follows. -The directives that are in the same categories share some characteristics. - - - -#### Declarative directives - -An OpenMP directive may only be placed in a declarative context. -A declarative directive results in one or more declarations only; -it is not associated with the immediate execution of any user code. - -List of existing ones: -* declare simd -* declare target -* threadprivate -* declare reduction - -There is a parser node for each of these directives and -the parser node saves information associated with the directive, -for example, -the name of the procedure-name in the `declare simd` directive. - -Each parse tree node keeps source provenance, -one for the directive name itself and -one for the entire directive starting from the directive name. - -A top-level class, `OpenMPDeclarativeConstruct`, -holds all four of the node types as discriminated unions -along with the source provenance for the entire directive -starting from `!$OMP`. - -In `parser-tree.h`, -`OpenMPDeclarativeConstruct` is part -of the `SpecificationConstruct` and `SpecificationPart` -in F18 because -a declarative directive can only be placed in the specification part -of a Fortran program. - -All the `Names` or `Designators` associated -with the declarative directive will be resolved in later phases. - -#### Executable directives - -An OpenMP directive that is **not** declarative. -That is, it may only be placed in an executable context. -It contains stand-alone directives and constructs -that are associated with code blocks. -The stand-alone directive is described in the next section. - -The constructs associated with code blocks listed below -share a similar structure: -_Begin Directive_, _Clause List_, _Code Block_, _End Directive_. -The _End Directive_ is optional for constructs -like Loop-associated constructs. - -* Block-associated constructs (`OpenMPBlockConstruct`) -* Loop-associated constructs (`OpenMPLoopConstruct`) -* Atomic construct (`OpenMPAtomicConstruct`) -* Sections Construct (`OpenMPSectionsConstruct`, - contains Sections/Parallel Sections constructs) -* Critical Construct (`OpenMPCriticalConstruct`) - -A top-level class, `OpenMPConstruct`, -includes stand-alone directive and constructs -listed above as discriminated unions. - -In the `parse-tree.h`, `OpenMPConstruct` is an element -of the `ExecutableConstruct`. - -All the `Names` or `Designators` associated -with the executable directive will be resolved in Semantic Analysis. - -When the backtracking parser can not identify the associated code blocks, -the parse tree will be rewritten later in the Semantics Analysis. - -#### Stand-alone Directives - -An OpenMP executable directive that has no associated user code -except for that which appears in clauses in the directive. - -List of existing ones: -* taskyield -* barrier -* taskwait -* target enter data -* target exit data -* target update -* ordered -* flush -* cancel -* cancellation point - -A higher-level class is created for each category -which contains directives listed above that share a similar structure: -* OpenMPSimpleStandaloneConstruct -(taskyield, barrier, taskwait, -target enter/exit data, target update, ordered) -* OpenMPFlushConstruct -* OpenMPCancelConstruct -* OpenMPCancellationPointConstruct - -A top-level class, `OpenMPStandaloneConstruct`, -holds all four of the node types as discriminated unions -along with the source provenance for the entire directive. -Also, each parser node for the stand-alone directive saves -the source provenance for the directive name itself. - -### Clauses - -Each clause represented as a distinct class in `parse-tree.h`. -A top-level class, `OmpClause`, -includes all the clauses as discriminated unions. -The parser node for `OmpClause` saves the source provenance -for the entire clause. - -All the `Names` or `Designators` associated -with the clauses will be resolved in Semantic Analysis. - -Note that the backtracking parser will not validate -that the list of clauses associated -with a directive is valid other than to make sure they are well-formed. -In particular, -the parser does not check that -the association between directive and clauses is correct -nor check that the values in the directives or clauses are correct. -These checks are deferred to later phases of semantics to simplify the parser. - -## Symbol Table Extensions for OpenMP - -Name resolution can be impacted by the OpenMP code. -In addition to the regular steps to do the name resolution, -new scopes and symbols may need to be created -when encountering certain OpenMP constructs. -This section describes the extensions -for OpenMP during Symbol Table construction. - -OpenMP uses the fork-join model of parallel execution and -all OpenMP threads have access to -a _shared_ memory place to store and retrieve variables -but each thread can also have access to -its _threadprivate_ memory that must not be accessed by other threads. - -For the directives and clauses that can control the data environments, -compiler needs to determine two kinds of _access_ -to variables used in the directive’s associated structured block: -**shared** and **private**. -Each variable referenced in the structured block -has an original variable immediately outside of the OpenMP constructs. -Reference to a shared variable in the structured block -becomes a reference to the original variable. -However, each private variable referenced in the structured block, -a new version of the original variable (of the same type and size) -will be created in the threadprivate memory. - -There are exceptions that directives/clauses -need to create a new `Symbol` without creating a new `Scope`, -but in general, -when encountering each of the data environment controlling directives -(discussed in the following sections), -a new `Scope` will be created. -For each private variable referenced in the structured block, -a new `Symbol` is created out of the original variable -and the new `Symbol` is associated -with original variable’s `Symbol` via `HostAssocDetails`. -A new set of OpenMP specific flags are added -into `Flag` class in `symbol.h` to indicate the types of -associations, -data-sharing attributes, -and data-mapping attributes -in the OpenMP data environments. - -### New Symbol without new Scope - -OpenMP directives that require new `Symbol` to be created -but not new `Scope` are listed in the following table -in terms of the Symbol Table extensions for OpenMP: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Directives/Clauses - Create New -

-Symbol -

-w/ -

Add Flag -
on Symbol of - Flag -
Declarative Directives - declare simd [(proc-name)] - - - The name of the enclosing function, subroutine, or interface body - to which it applies, or proc-name - OmpDeclareSimd -
declare target - - - The name of the enclosing function, subroutine, or interface body - to which it applies - OmpDeclareTarget -
threadprivate(list) - - - named variables and named common blocks - OmpThreadPrivate -
declare reduction - * - reduction-identifier - OmpDeclareReduction -
Stand-alone directives - flush - - - variable, array section or common block name - OmpFlushed -
critical [(name)] - - - name (user-defined identifier) - OmpCriticalLock -
if ([ directive-name-modifier :] scalar-logical-expr) - - - directive-name-modifier - OmpIfSpecified -
- - - - No Action - - * Discussed in “Module File Extensions for OpenMP” section - - -### New Symbol with new Scope - -For the following OpenMP regions: - -* `target` regions -* `teams` regions -* `parallel` regions -* `simd` regions -* task generating regions (created by `task` or `taskloop` constructs) -* worksharing regions -(created by `do`, `sections`, `single`, or `workshare` constructs) - -A new `Scope` will be created -when encountering the above OpenMP constructs -to ensure the correct data environment during the Code Generation. -To determine whether a variable referenced in these regions -needs the creation of a new `Symbol`, -all the data-sharing attribute rules -described in OpenMP Spec [2.15.1] apply during the Name Resolution. -The available data-sharing attributes are: -**_shared_**, -**_private_**, -**_linear_**, -**_firstprivate_**, -and **_lastprivate_**. -The attribute is represented as `Flag` in the `Symbol` object. - -More details are listed in the following table: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Attribute - Create New Symbol - Add Flag -
on Symbol of - Flag -
shared - No - Original variable - OmpShared -
private - Yes - New Symbol - OmpPrivate -
linear - Yes - New Symbol - OmpLinear -
firstprivate - Yes - New Symbol - OmpFirstPrivate -
lastprivate - Yes - New Symbol - OmpLastPrivate -
- -To determine the right data-sharing attribute, -OpenMP defines that the data-sharing attributes -of variables that are referenced in a construct can be -_predetermined_, _explicitly determined_, or _implicitly determined_. - -#### Predetermined data-sharing attributes - -* Assumed-size arrays are **shared** -* The loop iteration variable(s) -in the associated _do-loop(s)_ of a -_do_, -_parallel do_, -_taskloop_, -or _distributeconstruct_ -is (are) **private** -* A loop iteration variable -for a sequential loop in a _parallel_ or task generating construct -is **private** in the innermost such construct that encloses the loop -* Implied-do indices and _forall_ indices are **private** -* The loop iteration variable in the associated _do-loop_ -of a _simd_ construct with just one associated _do-loop_ -is **linear** with a linear-step -that is the increment of the associated _do-loop_ -* The loop iteration variables in the associated _do-loop(s)_ of a _simd_ -construct with multiple associated _do-loop(s)_ are **lastprivate** - -#### Explicitly determined data-sharing attributes - -Variables with _explicitly determined_ data-sharing attributes are: - -* Variables are referenced in a given construct -* Variables are listed in a data-sharing attribute clause on the construct. - -The data-sharing attribute clauses are: -* _default_ clause -(discussed in “Implicitly determined data-sharing attributes”) -* _shared_ clause -* _private_ clause -* _linear_ clause -* _firstprivate_ clause -* _lastprivate_ clause -* _reduction_ clause -(new `Symbol` created with the flag `OmpReduction` set) - -Note that variables with _predetermined_ data-sharing attributes -may not be listed (with exceptions) in data-sharing attribute clauses. - -#### Implicitly determined data-sharing attributes - -Variables with implicitly determined data-sharing attributes are: - -* Variables are referenced in a given construct -* Variables do not have _predetermined_ data-sharing attributes -* Variables are not listed in a data-sharing attribute clause -on the construct. - -Rules for variables with _implicitly determined_ data-sharing attributes: - -* In a _parallel_ construct, if no _default_ clause is present, -these variables are **shared** -* In a task generating construct, -if no _default_ clause is present, -a variable for which the data-sharing attribute -is not determined by the rules above -and that in the enclosing context is determined -to be shared by all implicit tasks -bound to the current team is **shared** -* In a _target_ construct, -variables that are not mapped after applying data-mapping attribute rules -(discussed later) are **firstprivate** -* In an orphaned task generating construct, -if no _default_ clause is present, dummy arguments are **firstprivate** -* In a task generating construct, if no _default_ clause is present, -a variable for which the data-sharing attribute is not determined -by the rules above is **firstprivate** -* For constructs other than task generating constructs or _target_ constructs, -if no _default_ clause is present, -these variables reference the variables with the same names -that exist in the enclosing context -* In a _parallel_, _teams_, or task generating construct, -the data-sharing attributes of these variables are determined -by the _default_ clause, if present: - * _default(shared)_ - clause causes all variables referenced in the construct - that have _implicitly determined_ data-sharing attributes - to be **shared** - * _default(private)_ - clause causes all variables referenced in the construct - that have _implicitly determined_ data-sharing attributes - to be **private** - * _default(firstprivate)_ - clause causes all variables referenced in the construct - that have _implicitly determined_ data-sharing attributes - to be **firstprivate** - * _default(none)_ - clause requires that each variable - that is referenced in the construct, - and that does not have a _predetermined_ data-sharing attribute, - must have its data-sharing attribute _explicitly determined_ - by being listed in a data-sharing attribute clause - - -### Data-mapping Attribute - -When encountering the _target data_ and _target_ directives, -the data-mapping attributes of any variable referenced in a target region -will be determined and represented as `Flag` in the `Symbol` object -of the variable. -No `Symbol` or `Scope` will be created. - -The basic steps to determine the data-mapping attribute are: - -1. If _map_ clause is present, -the data-mapping attribute is determined by the _map-type_ -on the clause and its corresponding `Flag` are listed below: - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
-data-mapping attribute - Flag -
to - OmpMapTo -
from - OmpMapFrom -
tofrom -(default if map-type is not present) - OmpMapTo & OmpMapFrom -
alloc - OmpMapAlloc -
release - OmpMapRelease -
delete - OmpMapDelete -
- -2. Otherwise, the following data-mapping rules apply -for variables referenced in a _target_ construct -that are _not_ declared in the construct and -do not appear in data-sharing attribute or map clauses: - * If a variable appears in a _to_ or _link_ clause - on a _declare target_ directive then it is treated - as if it had appeared in a _map_ clause with a _map-type_ of **tofrom** -3. Otherwise, the following implicit data-mapping attribute rules apply: - * If a _defaultmap(tofrom:scalar)_ clause is _not_ present - then a scalar variable is not mapped, - but instead has an implicit data-sharing attribute of **firstprivate** - * If a _defaultmap(tofrom:scalar)_ clause is present - then a scalar variable is treated as if it had appeared - in a map clause with a map-type of **tofrom** - * If a variable is not a scalar - then it is treated as if it had appeared in a map clause - with a _map-type_ of **tofrom** - -After the completion of the Name Resolution phase, -all the data-sharing or data-mapping attributes marked for the `Symbols` -may be used later in the Semantics Analysis and in the Code Generation. - -## Module File Extensions for OpenMP - -After the successful compilation of modules and submodules -that may contain the following Declarative Directives, -the entire directive starting from `!$OMP` needs to be written out -into `.mod` files in their corresponding Specification Part: - -* _declare simd_ or _declare target_ - - In the “New Symbol without new Scope” section, - we described that when encountering these two declarative directives, - new `Flag` will be applied to the Symbol of the name of - the enclosing function, subroutine, or interface body to - which it applies, or proc-name. - This `Flag` should be part of the API information - for the given subroutine or function - -* _declare reduction_ - - The _reduction-identifier_ in this directive - can be use-associated or host-associated. - However, it will not act like other Symbols - because user may have a reduction name - that is the same as a Fortran entity name in the same scope. - Therefore a specific data structure needs to be created - to save the _reduction-identifier_ information - in the Scope and this directive needs to be written into `.mod` files - -## Phases of OpenMP Analysis - -1. Create the parse tree for OpenMP - 1. Add types for directives and clauses - 1. Add type(s) that will be used for directives - 2. Add type(s) that will be used for clauses - 3. Add other types, e.g. wrappers or other containers - 4. Use std::variant to encapsulate meaningful types - 2. Implemented in the parser for OpenMP (openmp-grammar.h) -2. Create canonical nesting - 1. Restructure parse tree to reflect the association - of directives and stmts - 1. Associate `OpenMPLoopConstruct` - with `DoConstruct` and `OpenMPEndLoopDirective` - 1. Investigate, and perhaps reuse, - the algorithm used to restructure do-loops - 2. Add a pass near the code that restructures do-loops; - but do not extend the code that handles do-loop for OpenMP; - keep this code separate. - 3. Report errors that prevent restructuring - (e.g. loop directive not followed by loop) - We should abort in case of errors - because there is no point to perform further checks - if it is not a legal OpenMP construct -3. Validate the structured-block - 1. Structured-block is a block of executable statements - 1. Single entry and single exit - 1. Access to the structured block must not be the result of a branch - 1. The point of exit cannot be a branch out of the structured block -4. Check that directive and clause combinations are legal - 1. Begin and End directive should match - 1. Simply check that the clauses are allowed by the directives - 1. Write as a separate pass for simplicity and correctness of the parse tree -5. Write parse tree tests - 1. At this point, the parse tree should be perfectly formed - 1. Write tests that check for correct form and provenance information - 1. Write tests for errors that can occur during the restructuring -6. Scope, symbol tables, and name resolution - 1. Update the existing code to handle names and scopes introduced by OpenMP - 1. Write tests to make sure names are properly implemented -7. Check semantics that is specific to each directive - 1. Validate the directive and its clauses - 1. Some clause checks require the result of name resolution, - i.e. “A list item may appear in a _linear_ or _firstprivate_ clause - but not both.” - 1. TBD: - Validate the nested statement for legality in the scope of the directive - 1. Check the nesting of regions [OpenMP 4.5 spec 2.17] -8. Module file utilities - 1. Write necessary OpenMP declarative directives to `.mod` files - 2. Update the existing code - to read available OpenMP directives from the `.mod` files diff --git a/flang/documentation/OptionComparison.md b/flang/documentation/OptionComparison.md deleted file mode 100644 --- a/flang/documentation/OptionComparison.md +++ /dev/null @@ -1,1339 +0,0 @@ - - -# Compiler options - -This document catalogs the options processed by F18's peers/competitors. Much of the document is taken up by a set of tables that list the options categorized into different topics. Some of the table headings link to more information about the contents of the tables. For example, the table on **Standards conformance** options links to [notes on Standards conformance](#standards). - -**There's also important information in the ___[Notes section](#notes)___ near the end of the document on how this data was gathered and what ___is___ and ___is not___ included in this document.** - -Note that compilers may support language features without having an option for them. Such cases are frequently, but not always noted in this document. - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Standards conformance -
Option Cray GNU IBM Intel PGI Flang
Overall conformance en, -

-eN -

std=level qlanglvl, qsaa - stand level - Mstandard - Mstandard -
Compatibility with previous standards or implementations - N/A - fdec, -

-fall-instrinsics -

qxlf77, -

-qxlf90, -

-qxlf2003, -

-qxfl2008, -

-qport -

f66, -

-f77rtl, -

-fpscomp, -

-Intconstant, -

-nostandard-realloc-lhs, -

-standard-semantics, -

-assume nostd_intent_in, -

-assume nostd_value, -

-assume norealloc_lhs -

Mallocatable=95|03 - Mallocatable=95|03 -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Source format -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
Fixed or free source - f free, -

-f fixed -

ffree-form, -

-ffixed-form -

qfree, -

-qfixed -

fixed, -

-free -

Mfree, -

-Mfixed -

Mfreeform, -

-Mfixed -

Source line length - N col - ffixed-line-length-n, -

-ffree-line-length-n -

qfixed=n - extend-source [size] - Mextend - Mextend -
Column 1 comment specifier - ed - fd-lines-as-code, -

-fd-lines-as-comments -

D, -

-qdlines, -

-qxlines -

d-lines - Mdlines - N/A -
Don't treat CR character as a line terminator - NA - N/A - qnocr - N/A - N/A - N/A -
Source file naming - N/A - N/A - qsuffix - extfor, -

-Tf filename -

N/A - N/A -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Names, Literals, and other tokens -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
Max identifier length - N/A - fmax-identifier-length=n - N/A - N/A - N/A - N/A -
"$" in symbol names - N/A - fdollar-ok - default - default - N/A - N/A -
Allow names with leading "_" - eQ - N/A - N/A - N/A - N/A - N/A -
Specify name format - N/A - N/A - U - names=keyword - Mupcase - NA -
Escapes in literals - N/A - fbackslash - qescape - assume bscc - Mbackslash - Mbackslash -
Allow multibyte characters in strings - N/A - N/A - qmbcs - N/A - N/A - N/A -
Create null terminated strings - N/A - N/A - qnullterm - N/A - N/A - N/A -
Character to use for "$" - N/A - N/A - N/A - N/A - Mdollar,char - -
Allow PARAMETER statements without parentheses - N/A - N/A - N/A - altparam - N?A - N/A -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
DO loop handling -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
One trip DO loops - ej - N/A - 1, -

-qonetrip -

f66 - Monetrip - N/A -
Allow branching into loops - eg - N/A - N/A - N/A - N/A - N/A -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
REAL, DOUBLE PRECISION, and COMPLEX Data -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
Default REAL size - s real32, -

-s real64, -

-s default32, -

-s default64 -

fdefault-real-[8|10|16] - qrealsize=[4|8] - real-size [32|64|128] - r[4|8] - r8, -

-fdefault-real-8 -

Default DOUBLE PRECISION size - ep - fdefault-double-8 - N/A - double-size[64|128] - N/A - N/A -
Make real constants DOUBLE PRECISION - N/A - N/A - qdpc - N/A - N/A - N/A -
Promote or demote REAL type sizes - N/A - freal-[4|8]-real[4|8|10|16] - qautodbl=size - N/A - Mr8, -

-Mr8intrinsics -

N/A -
Rounding mode - N/A - N/A - qieee - assume std_minus0_rounding - N/A - N/A -
Treatment of -0.0 - N/A - N/A - N/A - assume minus0 - - -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
INTEGER and LOGICAL Data -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
Default INTEGER size - s integer32, -

-s integer64, -

-s default32, -

-s default64 -

fdefault-integer-8 - qintsize=[2|4|8] - integer-size [32|64|128] - I[2|4|8], -

-Mi4, -

-Mnoi4 -

i8, -

-fdefault-integer-8 -

Promote INTEGER sizes - N/A - finteger-4-integer-8 - N/A - N/A - N/A - N/A -
Enable 8 and 16 bit INTEGER and LOGICALS - eh - N/A - N/A - N/A - N/A - N/A -
Change how the compiler treats LOGICAL - N/A - N/A - N/A - N/A - Munixlogical - -
Treatment of numeric constants as arguments - N/A - N/A - qxlf77 oldboz - assume old_boz - N/A - N/A -
Treatment of assignment between numerics and logicals - N/A - N/A - N/A - assume old_logical_assign - N/A - N/A -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
CHARACTER and Pointer Data -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
Use bytes for pointer arithmetic - s byte_pointer - N/A - N/A - N/A - N/A - N/A -
Use words for pointer arithmetic - S word_pointer - N/A - N/A - N/A - N/A - N/A -
Allow character constants for typeless constants - N/A - N/A - qctyplss - N/A - N/A - N/A -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Data types and allocation -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
Default to IMPLICIT NONE - eI - fimplicit-none - u, qundef - warn declarations - Mdclchk - N/A -
Enable DEC STRUCTURE extensions - N/A - fdec-structure - N/A - N/A - default - N/A -
Enable Cray pointers - default - fcray-pointer - Default (near equivalent) - Default (near equivalent) - Mcray - N/A -
Allow bitwise logical operations on numeric - ee - N/A - qintlog - N/A - N/A - N/A -
Allow DEC STATIC and AUTOMATIC declarations - default - fdec-static - Default, see IMPLICIT STATIC and IMPLICIT AUTOMATIC - Default, see AUTOMATIC and STATIC - Default - N/A -
Allocate variables to static storage - ev - fno-automatic - qsave - save, -

-noauto -

Mnorecursive, -

-Msave -

N/A -
Compile procedures as if RECURSIVE - eR - frecursive - q recur - assume recursion, -

-recursive -

Mrecursive - Mrecursive -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Arrays -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
Enable coarrays - h caf - fcoarray=key - N/A - coarray[=keyword] - N/A - N/A -
Contiguous array pointers - h contiguous - N/A - qassert=contig - assume contiguous_pointer - N/A - N/A -
Contiguous assumed shape dummy arguments - h contiguous_assumed_shape - frepack-arrays - qassert=contig - assume contiguous_assumed_shape - N/A - N/A -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
OpenACC, OpenMP, and CUDA -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
Enable OpenACC - h acc - fopenacc - N/A - N/A - acc - N/A -
Enable OpenMP - h omp - fopenmp - qswapomp - qopenmp, -

-qopenmp-lib, -

-qopenmp-link, -

-qopenmp-offload, -

-qopenmp-simd, -

-qopenmp-stubs, -

-qopenmp-threadprivate -

mp, -

-Mcuda -

-mp -
- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -
Miscellaneous -
Option - Cray - GNU - IBM - Intel - PGI - Flang -
Disable compile time range checking - N/A - fno-range-check - N/A - N/A - N/A - N/A -
Disable call site checking - dC - N/A - N/A - N/A - N/A - N/A -
Warn for bad call checking - eb - N/A - N/A - N/A - N/A - N/A -
Set default accessibility of module entities to PRIVATE - N/A - fmodule-private - N/A - N/A - N/A - N/A -
Force FORALL to use temp - N/A - ftest-forall-temp - N/A - N/A - N/A - N/A -
- - - -##
Notes - -**Standards conformance:** - -All conformance options are similar -- they issue warnings if non-standard features are used. All defaults are to allow extensions without warnings. The GNU, IBM, and Intel compilers allow multiple standard levels to be specified. - - - -* **Cray**: The capital "-eN" option specifies to issue error messages for non-compliance rather than warnings. -* **GNU:** The "std=_level_" option specifies the standard to which the program is expected to conform. The default value for std is 'gnu', which specifies a superset of the latest Fortran standard that includes all of the extensions supported by GNU Fortran, although warnings will be given for obsolete extensions not recommended for use in new code. The 'legacy' value is equivalent but without the warnings for obsolete extensions. The 'f95', 'f2003', 'f2008', and 'f2018' values specify strict conformance to the respective standards. Errors are given for all extensions beyond the relevant language standard, and warnings are given for the Fortran 77 features that are permitted but obsolescent in later standards. '-std=f2008ts' allows the Fortran 2008 standard including the additions of the Technical Specification (TS) 29113 on Further Interoperability of Fortran with C and TS 18508 on Additional Parallel Features in Fortran. Values for "_level_" are f_95, f2003, f2008, f2008ts, f2018, gnu,_ and _legacy._ - -**Source format:** - -**Fixed or free source:** Cray, IBM, and Intel default the source format based on the source file suffix as follows: - - - -* **Cray** - * **Free:** .f90, .F90, .f95, .F95, .f03, .F03, .f08, .F08, .ftn, .FTN - * **Fixed:** .f, .F, .for, .FOR -* **Intel** - * **Free:** .f90, .F90, .i90 - * **Fixed:** .f, .for, .FOR, .ftn, .FTN, .fpp, .FPP, .i - -IBM Fortran's options allow the source line length to be specified with the option, e.g., "-qfixed=72". IBM bases the default on the name of the command used to invoke the compiler. IBM has 16 different commands that invoke the Fortran compiler, and the default use of free or fixed format and the line length are based on the command name. -qfixed=72 is the default for the xlf, xlf_r, f77, and fort77 commands. -qfree=f90is the default for the f90, xlf90, xlf90_r, f95, xlf95, xlf95_r, f2003, xlf2003, xlf2003_r, f2008, xlf2008, and xlf2008_r commands. The maximum line length for either source format is 132 characters. - -**Column 1 comment specifier:** All compilers allow "D" in column 1 to specify that the line contains a comment and have this as the default for fixed format source. IBM also supports an "X" in column 1 with the option "-qxlines". - -**Source line length:** - - -* **Cray:** The "-N _col_" option specifies the line width for fixed- and free-format source lines. The value used for col specifies the maximum number of columns per line. For free form sources, col can be set to 132, 255, or 1023. For fixed form sources, col can be set to 72, 80, 132, 255, or 1023. Characters in columns beyond the col specification are ignored. By default, lines are 72 characters wide for fixed-format sources and 255 characters wide for free-form sources. -* **GNU:** For both "ffixed-line-length-_n_" and "ffree-line-length-_n_" options, characters are ignored after the specified length. The default for fixed is 72. The default for free is 132. For free, you can specify 'none' as the length, which means that all characters in the line are meaningful. -* **IBM:** For **fixed**, the default is 72. For **free**, there's no default, but the maximum length for either form is 132. -* **Intel:** The default is 72 for **fixed** and 132 for **free**. -* **PGI, Flang:** - * in free form, it is an error if the line is longer than 1000 characters - * in fixed form by default, characters after column 72 are ignored - * in fixed form with -Mextend, characters after column 132 are ignored - -**Names, Literals, and other tokens** - -**Escapes in literals:** - - -* **GNU:** The "-fbackslash" option the interpretation of backslashes in string literals from a single backslash character to "C-style" escape characters. The following combinations are expanded \a, \b, \f, \n, \r, \t, \v, \\, and \0 to the ASCII characters alert, backspace, form feed, newline, carriage return, horizontal tab, vertical tab, backslash, and NUL, respectively. Additionally, \xnn, \unnnn and \Unnnnnnnn (where each n is a hexadecimal digit) are translated into the Unicode characters corresponding to the specified code points. All other combinations of a character preceded by \ are unexpanded. -* **Intel:** The option "-assume bscc" tells the compiler to treat the backslash character (\) as a C-style control (escape) character syntax in character literals. "nobscc" specifies that the backslash character is treated as a normal character in character literals. This is the default. - -**"$" in symbol names:** Allowing "$" in names is controlled by an option in GNU and is the default behavior in IBM and Intel. Presumably, these compilers issue warnings when standard conformance options are enabled. Dollar signs in names don't seem to be allowed in Cray, PGI, or Flang. - -**DO loop handling** - -**One trip:** - - - -* **IBM:** IBM has two options that do the same thing: "-1" and "-qonetrip". -* **Intel:** Intel used to support a "-onetrip" option, but it has been removed. Intel now supports a "-f66" option that ensures that DO loops are executed at least once in addition to [several other Fortran 66 semantic features](https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-f66#320D769C-7C41-4A84-AE0E-50A72296A838). - -**REAL, DOUBLE PRECISION, and COMPLEX Data** - -These size options affect the sizes of variables, literals, and intrinsic function results. - -**Default REAL sizes:** These options do not affect the size of explicitly declared data (for example, REAL(KIND=4). - - - -* **Cray:** The "-s default32" and "-s default64" options affect both REAL, INTEGER, and LOGICAL types. - -**Default DOUBLE PRECISION:** These options allow control of the size of DOUBLE PRECISION types in conjunction with controlling REAL types. - - - -* **Cray:** The "-ep" option controls DOUBLE PRECISION. This option can only be enabled when the default data size is 64 bits ("-s default64" or "-s real64"). When "-s default64" or "-s real64" is specified, and double precision arithmetic is disabled, DOUBLE PRECISION variables and constants specified with the D exponent are converted to default real type (64-bit). If double precision is enabled ("-ep"), they are handled as a double precision type (128-bit). Similarly when the "-s default64" or" -s real64" option is used, variables declared on a DOUBLE COMPLEX statement and complex constants specified with the D exponent are mapped to the complex type in which each part has a default real type, so the complex variable is 128-bit. If double precision is enabled ("-ep"), each part has double precision type, so the double complex variable is 256-bit. -* **GNU:** The "-fdefault-double-8" option sets the DOUBLE PRECISION type to an 8 byte wide type. Do nothing if this is already the default. If "-fdefault-real-8" is given, DOUBLE PRECISION would instead be promoted to 16 bytes if possible, and "-fdefault-double-8" can be used to prevent this. The kind of real constants like 1.d0 will not be changed by "-fdefault-real-8" though, so also "-fdefault-double-8" does not affect it. - -**Promote or demote REAL type sizes:** These options change the meaning of data types specified by declarations of the form REAL(KIND=_N_), except, perhaps for PGI. - -* **GNU:** The allowable combinations are "-freal-4-real-8", "-freal-4-real-10", "-freal-4-real-16", "-freal-8-real-4", "-freal-8-real-10", and "-freal-8-real-16". -* **IBM:** The "-qautodbl" option is documented [here](https://www-01.ibm.com/support/docview.wss?uid=swg27024803&aid=1#page=144). -* **PGI:** The "-Mr8" option promotes REAL variables and constants to DOUBLE PRECISION variables and constants, respectively. DOUBLE PRECISION elements are 8 bytes in length. The "-Mr8intrinsics" option promotes the intrinsics CMPLX and REAL as DCMPLX and DBLE, respectively. - -**INTEGER and LOGICAL Data** - -These size options affect the sizes of variables, literals, and intrinsic function results. - -**Default INTEGER sizes:** For all compilers, these options affect both INTEGER and LOGICAL types. - -**Enable 8 and 16 bit INTEGER and LOGICAL:** This Cray option ("-eh") enables support for 8-bit and 16-bit INTEGER and LOGICAL types that use explicit kind or star values. By default ("-eh"), data objects declared as INTEGER(kind=1) or LOGICAL(kind=1) are 8 bits long, and objects declared as INTEGER(kind=2) or LOGICAL(kind=2) are 16 bits long. When this option is disabled ("-dh"), data objects declared as INTEGER(kind=1), INTEGER(kind=2), LOGICAL(kind=1), or LOGICAL(kind=2) are 32 bits long. - -**Intrinsic functions** - -GNU is the only compiler with options governing the use of non-standard intrinsics. For more information on the GNU options, see [here](https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gfortran/Fortran-Dialect-Options.html#Fortran-Dialect-Options). All compilers implement non-standard intrinsics but don't have options that affect access to them. - -**Arrays** - -**Contiguous array pointers:** All vendors that implement this option (Cray, IBM, and Intel) seem to have apply to all pointer targets. Assuming that the arrays that are targeted by the pointers allows greater optimization. - -**Contiguous assumed shape dummy arguments:** Cray and Intel have a separate argument that's specific to assumed shape dummy arguments. - -**Miscellaneous** - -**Disable call site checking:** This Cray option ("-dC") disables some types of standard call site checking. The current Fortran standard requires that the number and types of arguments must agree between the caller and callee. These constraints are enforced in cases where the compiler can detect them, however, specifying "-dC" disables some of this error checking, which may be necessary in order to get some older Fortran codes to compile. If error checking is disabled, unexpected compile-time or run time results may occur. The compiler by default attempts to detect situations in which an interface block should be specified but is not. Specifying "-dC" disables this type of checking as well. - -**Warn for bad call checking**: This Cray option ("-eb") issues a warning message rather than an error message when the compiler detects a call to a procedure with one or more dummy arguments having the TARGET, VOLATILE or ASYNCHRONOUS attribute and there is not an explicit interface definition. - - -## Notes - - -### What is and is not included - -This document focuses on options relevant to the Fortran language. This includes some features (such as recursion) that are only indirectly related. Options related to the following areas are not included: - - - -* Input/Output -* Optimization -* Preprocessing -* Inlining -* Alternate library definition or linking -* Choosing file locations for compiler input or output -* Modules -* Warning and error messages and listing output -* Data initialization -* Run time checks -* Debugging -* Specification of operating system -* Target architecture -* Assembler generation -* Threads or parallelization -* Profiling and code coverage - - -### Data sources - -Here's the list of compilers surveyed, hot linked to the source of data on it. Note that this is the only mention of the Oracle and NAG compilers in this document. - - - -* [Cray Fortran Reference Manual version 8.7](https://pubs.cray.com/content/S-3901/8.7/cray-fortran-reference-manual/compiler-command-line-options) -* IBM (XLF) version 14.1 -- [Compiler Referenc](https://www-01.ibm.com/support/docview.wss?uid=swg27024803&aid=1#page=93)e, [Language Reference](https://www-01.ibm.com/support/docview.wss?uid=swg27024776&aid=1) -* [Intel Fortran version 19.0](https://software.intel.com/en-us/fortran-compiler-developer-guide-and-reference-alphabetical-list-of-compiler-options) -* [GNU Fortran Compiler version 8.3.0](https://gcc.gnu.org/onlinedocs/gcc-8.3.0/gfortran/Option-Summary.html) -* [NAG Fortran Release 6.2](https://www.nag.co.uk/nagware/np/r62_doc/manual/compiler_2_4.html) -* [Oracle Fortran version 819-0492-10](https://docs.oracle.com/cd/E19059-01/stud.10/819-0492/3_options.html) -* PGI -- [Compiler Reference version 19.1](https://www.pgroup.com/resources/docs/19.1/x86/pgi-ref-guide/index.htm#cmdln-options-ref), [Fortran Reference Guide version 17](https://www.pgroup.com/doc/pgi17fortref.pdf) -* [Flang](https://github.com/flang-compiler/flang/wiki/Using-Flang) -- information from GitHub - -This document has been kept relatively small by providing links to much of the information about options rather than duplicating that information. For IBM, Intel, and some PGI options, there are direct links. But direct links were not possible for Cray, GNU and some PGI options. - -Many compilers have options that can either be enabled or disabled. Some compilers indicate this by the presence or absence of the letters "no" in the option name (IBM, Intel, and PGI) while Cray precedes many options with either "e" for enabled or "d" for disabled. This document only includes the enabled version of the option specification. - -Deprecated options were generally ignored, even though they were documented. diff --git a/flang/documentation/Overview.md b/flang/documentation/Overview.md deleted file mode 100644 --- a/flang/documentation/Overview.md +++ /dev/null @@ -1,103 +0,0 @@ - - -# Overview of Compiler Phases - -Each phase produces either correct output or fatal errors. - -## Prescan and Preprocess - -See: [Preprocessing.md](Preprocessing.md). - -**Input:** Fortran source and header files, command line macro definitions, - set of enabled compiler directives (to be treated as directives rather than - comments). - -**Output:** -- A "cooked" character stream: the entire program as a contiguous stream of - normalized Fortran source. - Extraneous whitespace and comments are removed (except comments that are - compiler directives that are not disabled) and case is normalized. -- Provenance information mapping each character back to the source it came from. - This is used in subsequent phases to issue errors messages that refer to source locations. - -**Entry point:** `parser::Parsing::Prescan` - -**Command:** `f18 -E src.f90` dumps the cooked character stream - -## Parse - -**Input:** Cooked character stream. - -**Output:** A parse tree representing a syntactically correct program, - rooted at a `parser::Program`. - See: [Parsing.md](Parsing.md) and [ParserCombinators.md](ParserCombinators.md). - -**Entry point:** `parser::Parsing::Parse` - -**Command:** - - `f18 -fdebug-dump-parse-tree -fparse-only src.f90` dumps the parse tree - - `f18 -funparse src.f90` converts the parse tree to normalized Fortran - -## Validate Labels and Canonicalize Do Statements - -**Input:** Parse tree. - -**Output:** The parse tree with label constraints and construct names checked, - and each `LabelDoStmt` converted to a `NonLabelDoStmt`. - See: [LabelResolution.md](LabelResolution.md). - -**Entry points:** `semantics::ValidateLabels`, `parser::CanonicalizeDo` - -## Resolve Names - -**Input:** Parse tree (without `LabelDoStmt`) and `.mod` files from compilation - of USEd modules. - -**Output:** -- Tree of scopes populated with symbols and types -- Parse tree with some refinements: - - each `parser::Name::symbol` field points to one of the symbols - - each `parser::TypeSpec::declTypeSpec` field points to one of the types - - array element references that were parsed as function references or - statement functions are corrected - -**Entry points:** `semantics::ResolveNames`, `semantics::RewriteParseTree` - -**Command:** `f18 -fdebug-dump-symbols -fparse-only src.f90` dumps the - tree of scopes and symbols in each scope - -## Check DO CONCURRENT Constraints - -**Input:** Parse tree with names resolved. - -**Output:** Parse tree with semantically correct DO CONCURRENT loops. - -## Write Module Files - -**Input:** Parse tree with names resolved. - -**Output:** For each module and submodule, a `.mod` file containing a minimal - Fortran representation suitable for compiling program units that depend on it. - See [ModFiles.md](ModFiles.md). - -## Analyze Expressions and Assignments - -**Input:** Parse tree with names resolved. - -**Output:** Parse tree with `parser::Expr::typedExpr` filled in and semantic - checks performed on all expressions and assignment statements. - -**Entry points**: `semantics::AnalyzeExpressions`, `semantics::AnalyzeAssignments` - -## Produce the Intermediate Representation - -**Input:** Parse tree with names and labels resolved. - -**Output:** An intermediate representation of the executable program. - See [FortranIR.md](FortranIR.md). diff --git a/flang/documentation/ParserCombinators.md b/flang/documentation/ParserCombinators.md deleted file mode 100644 --- a/flang/documentation/ParserCombinators.md +++ /dev/null @@ -1,166 +0,0 @@ - - -## Concept -The Fortran language recognizer here can be classified as an LL recursive -descent parser. It is composed from a *parser combinator* library that -defines a few fundamental parsers and a few ways to compose them into more -powerful parsers. - -For our purposes here, a *parser* is any object that attempts to recognize -an instance of some syntax from an input stream. It may succeed or fail. -On success, it may return some semantic value to its caller. - -In C++ terms, a parser is any instance of a class that -1. has a `constexpr` default constructor, -1. defines a type named `resultType`, and -1. provides a function (`const` member or `static`) that accepts a reference to a -`ParseState` as its argument and returns a `std::optional` as a -result, with the presence or absence of a value in the `std::optional<>` -signifying success or failure, respectively. -``` -std::optional Parse(ParseState &) const; -``` -The `resultType` of a parser is typically the class type of some particular -node type in the parse tree. - -`ParseState` is a class that encapsulates a position in the source stream, -collects messages, and holds a few state flags that determine tokenization -(e.g., are we in a character literal?). Instances of `ParseState` are -independent and complete -- they are cheap to duplicate whenever necessary to -implement backtracking. - -The `constexpr` default constructor of a parser is important. The functions -(below) that operate on instances of parsers are themselves all `constexpr`. -This use of compile-time expressions allows the entirety of a recursive -descent parser for a language to be constructed at compilation time through -the use of templates. - -### Fundamental Predefined Parsers -These objects and functions are (or return) the fundamental parsers: - -* `ok` is a trivial parser that always succeeds without advancing. -* `pure(x)` returns a trivial parser that always succeeds without advancing, - returning some value `x`. -* `pure()` is `pure(T{})` but does not require that T be copy-constructible. -* `fail(msg)` denotes a trivial parser that always fails, emitting the - given message as a side effect. The template parameter is the type of - the value that the parser never returns. -* `nextCh` consumes the next character and returns its location, - and fails at EOF. -* `"xyz"_ch` succeeds if the next character consumed matches any of those - in the string and returns its location. Be advised that the source - will have been normalized to lower case (miniscule) letters outside - character and Hollerith literals and edit descriptors before parsing. - -### Combinators -These functions and operators combine existing parsers to generate new parsers. -They are `constexpr`, so they should be viewed as type-safe macros. - -* `!p` succeeds if p fails, and fails if p succeeds. -* `p >> q` fails if p does, otherwise running q and returning its value when - it succeeds. -* `p / q` fails if p does, otherwise running q and returning p's value - if q succeeds. -* `p || q` succeeds if p does, otherwise running q. The two parsers must - have the same type, and the value returned by the first succeeding parser - is the value of the combination. -* `first(p1, p2, ...)` returns the value of the first parser that succeeds. - All of the parsers in the list must return the same type. - It is essentially the same as `p1 || p2 || ...` but has a slightly - faster implementation and may be easier to format in your code. -* `lookAhead(p)` succeeds if p does, but doesn't modify any state. -* `attempt(p)` succeeds if p does, safely preserving state on failure. -* `many(p)` recognizes a greedy sequence of zero or more nonempty successes - of p, and returns `std::list<>` of their values. It always succeeds. -* `some(p)` recognized a greedy sequence of one or more successes of p. - It fails if p immediately fails. -* `skipMany(p)` is the same as `many(p)`, but it discards the results. -* `maybe(p)` tries to match p, returning an `std::optional` value. - It always succeeds. -* `defaulted(p)` matches p, and when p fails it returns a - default-constructed instance of p's resultType. It always succeeds. -* `nonemptySeparated(p, q)` repeatedly matches "p q p q p q ... p", - returning a `std::list<>` of only the values of the p's. It fails if - p immediately fails. -* `extension(p)` parses p if strict standard compliance is disabled, - or with a warning if nonstandard usage warnings are enabled. -* `deprecated(p)` parses p if strict standard compliance is disabled, - with a warning if deprecated usage warnings are enabled. -* `inContext(msg, p)` runs p within an error message context; any - message that `p` generates will be tagged with `msg` as its - context. Contexts may nest. -* `withMessage(msg, p)` succeeds if `p` does, and if it does not, - it discards the messages from `p` and fails with the specified message. -* `recovery(p, q)` is equivalent to `p || q`, except that error messages - generated from the first parser are retained, and a flag is set in - the ParseState to remember that error recovery was necessary. -* `localRecovery(msg, p, q)` is equivalent to - `recovery(withMessage(msg, p), q >> pure())` where `A` is the - result type of 'p'. - It is useful for targeted error recovery situations within statements. - -Note that -``` -a >> b >> c / d / e -``` -matches a sequence of five parsers, but returns only the result that was -obtained by matching `c`. - -### Applicatives -The following *applicative* combinators combine parsers and modify or -collect the values that they return. - -* `construct(p1, p2, ...)` matches zero or more parsers in succession, - collecting their results and then passing them with move semantics to a - constructor for the type T if they all succeed. - If there is a single parser as the argument and it returns no usable - value but only success or failure (_e.g.,_ `"IF"_tok`), the default - nullary constructor of the type `T` is called. -* `sourced(p)` matches p, and fills in its `source` data member with the - locations of the cooked character stream that it consumed -* `applyFunction(f, p1, p2, ...)` matches one or more parsers in succession, - collecting their results and passing them as rvalue reference arguments to - some function, returning its result. -* `applyLambda([](&&x){}, p1, p2, ...)` is the same thing, but for lambdas - and other function objects. -* `applyMem(mf, p1, p2, ...)` is the same thing, but invokes a member - function of the result of the first parser for updates in place. - -### Token Parsers -Last, we have these basic parsers on which the actual grammar of the Fortran -is built. All of the following parsers consume characters acquired from -`nextCh`. - -* `space` always succeeds after consuming any spaces -* `spaceCheck` always succeeds after consuming any spaces, and can emit - a warning if there was no space in free form code before a character - that could continue a name or keyword -* `digit` matches one cooked decimal digit (0-9) -* `letter` matches one cooked letter (A-Z) -* `"..."_tok` match the content of the string, skipping spaces before and - after. Internal spaces are optional matches. The `_tok` suffix is - optional when the parser appears before the combinator `>>` or after - the combinator `/`. -* `"..."_sptok` is a string match in which the spaces are required in - free form source. -* `"..."_id` is a string match for a complete identifier (not a prefix of - a longer identifier or keyword). -* `parenthesized(p)` is shorthand for `"(" >> p / ")"`. -* `bracketed(p)` is shorthand for `"[" >> p / "]"`. -* `nonEmptyList(p)` matches a comma-separated list of one or more - instances of p. -* `nonEmptyList(errorMessage, p)` is equivalent to - `withMessage(errorMessage, nonemptyList(p))`, which allows one to supply - a meaningful error message in the event of an empty list. -* `optionalList(p)` is the same thing, but can be empty, and always succeeds. - -### Debugging Parser -Last, a string literal `"..."_debug` denotes a parser that emits the string to -`llvm::errs` and succeeds. It is useful for tracing while debugging a parser but should -obviously not be committed for production code. diff --git a/flang/documentation/Preprocessing.md b/flang/documentation/Preprocessing.md deleted file mode 100644 --- a/flang/documentation/Preprocessing.md +++ /dev/null @@ -1,223 +0,0 @@ - - -Fortran Preprocessing -===================== - -Behavior common to (nearly) all compilers: ------------------------------------------- -* Macro and argument names are sensitive to case. -* Fixed form right margin clipping after column 72 (or 132) - has precedence over macro name recognition, and also over - recognition of function-like parentheses and arguments. -* Fixed form right margin clipping does not apply to directive lines. -* Macro names are not recognized as such when spaces are inserted - into their invocations in fixed form. - This includes spaces at the ends of lines that have been clipped - at column 72 (or whatever). -* Text is rescanned after expansion of macros and arguments. -* Macros are not expanded within quoted character literals or - quoted FORMAT edit descriptors. -* Macro expansion occurs before any effective token pasting via fixed form - space removal. -* C-like line continuations with backslash-newline are allowed in - directives, including the definitions of macro bodies. -* `/* Old style C comments */` are ignored in directives and - removed from the bodies of macro definitions. -* `// New style C comments` are not removed, since Fortran has OPERATOR(//). -* C-like line continuations with backslash-newline can appear in - old-style C comments in directives. -* After `#define FALSE TRUE`, `.FALSE.` is replaced by `.TRUE.`; - i.e., tokenization does not hide the names of operators or logical constants. -* `#define KWM c` allows the use of `KWM` in column 1 as a fixed form comment - line indicator. -* A `#define` directive intermixed with continuation lines can't - define a macro that's invoked earlier in the same continued statement. - -Behavior that is not consistent over all extant compilers but which -probably should be uncontroversial: ------------------------------------ -* Invoked macro names can straddle a Fortran line continuation. -* ... unless implicit fixed form card padding intervenes; i.e., - in fixed form, a continued macro name has to be split at column - 72 (or 132). -* Comment lines may appear with continuations in a split macro names. -* Function-like macro invocations can straddle a Fortran fixed form line - continuation between the name and the left parenthesis, and comment and - directive lines can be there too. -* Function-like macro invocations can straddle a Fortran fixed form line - continuation between the parentheses, and comment lines can be there too. -* Macros are not expanded within Hollerith constants or Hollerith - FORMAT edit descriptors. -* Token pasting with `##` works in function-like macros. -* Argument stringization with `#` works in function-like macros. -* Directives can be capitalized (e.g., `#DEFINE`) in fixed form. -* Fixed form clipping after column 72 or 132 is done before macro expansion, - not after. -* C-like line continuation with backslash-newline can appear in the name of - a keyword-like macro definition. -* If `#` is in column 6 in fixed form, it's a continuation marker, not a - directive indicator. -* `#define KWM !` allows KWM to signal a comment. - -Judgement calls, where precedents are unclear: ----------------------------------------------- -* Expressions in `#if` and `#elif` should support both Fortran and C - operators; e.g., `#if 2 .LT. 3` should work. -* If a function-like macro does not close its parentheses, line - continuation should be assumed. -* ... However, the leading parenthesis has to be on the same line as - the name of the function-like macro, or on a continuation line thereof. -* If macros expand to text containing `&`, it doesn't work as a free form - line continuation marker. -* `#define c 1` does not allow a `c` in column 1 to be used as a label - in fixed form, rather than as a comment line indicator. -* IBM claims to be ISO C compliant and therefore recognizes trigraph sequences. -* Fortran comments in macro actual arguments should be respected, on - the principle that a macro call should work like a function reference. -* If a `#define` or `#undef` directive appears among continuation - lines, it may or may not affect text in the continued statement that - appeared before the directive. - -Behavior that few compilers properly support (or none), but should: -------------------------------------------------------------------- -* A macro invocation can straddle free form continuation lines in all of their - forms, with continuation allowed in the name, before the arguments, and - within the arguments. -* Directives can be capitalized in free form, too. -* `__VA_ARGS__` and `__VA_OPT__` work in variadic function-like macros. - -In short, a Fortran preprocessor should work as if: ---------------------------------------------------- -1. Fixed form lines are padded up to column 72 (or 132) and clipped thereafter. -2. Fortran comments are removed. -3. C-style line continuations are processed in preprocessing directives. -4. C old-style comments are removed from directives. -5. Fortran line continuations are processed (outside preprocessing directives). - Line continuation rules depend on source form. - Comment lines that are enabled compiler directives have their line - continuations processed. - Conditional compilation preprocessing directives (e.g., `#if`) may be - appear among continuation lines, and have their usual effects upon them. -6. Other preprocessing directives are processed and macros expanded. - Along the way, Fortran `INCLUDE` lines and preprocessor `#include` directives - are expanded, and all these steps applied recursively to the introduced text. -7. Any Fortran comments created by macro replacement are removed. - -Steps 5 and 6 are interleaved with respect to the preprocessing state. -Conditional compilation preprocessing directives always reflect only the macro -definition state produced by the active `#define` and `#undef` preprocessing directives -that precede them. - -If the source form is changed by means of a compiler directive (i.e., -`!DIR$ FIXED` or `FREE`) in an included source file, its effects cease -at the end of that file. - -Last, if the preprocessor is not integrated into the Fortran compiler, -new Fortran continuation line markers should be introduced into the final -text. - -OpenMP-style directives that look like comments are not addressed by -this scheme but are obvious extensions. - -Appendix -======== -`N` in the table below means "not supported"; this doesn't -mean a bug, it just means that a particular behavior was -not observed. -`E` signifies "error reported". - -The abbreviation `KWM` stands for "keyword macro" and `FLM` means -"function-like macro". - -The first block of tests (`pp0*.F`) are all fixed-form source files; -the second block (`pp1*.F90`) are free-form source files. - -``` -f18 -| pgfortran -| | ifort -| | | gfortran -| | | | xlf -| | | | | nagfor -| | | | | | -. . . . . . pp001.F keyword macros -. . . . . . pp002.F #undef -. . . . . . pp003.F function-like macros -. . . . . . pp004.F KWMs case-sensitive -. N . N N . pp005.F KWM split across continuation, implicit padding -. N . N N . pp006.F ditto, but with intervening *comment line -N N N N N N pp007.F KWM split across continuation, clipped after column 72 -. . . . . . pp008.F KWM with spaces in name at invocation NOT replaced -. N . N N . pp009.F FLM call split across continuation, implicit padding -. N . N N . pp010.F ditto, but with intervening *comment line -N N N N N N pp011.F FLM call name split across continuation, clipped -. N . N N . pp012.F FLM call name split across continuation -. E . N N . pp013.F FLM call split between name and ( -. N . N N . pp014.F FLM call split between name and (, with intervening *comment -. E . N N . pp015.F FLM call split between name and (, clipped -. E . N N . pp016.F FLM call split between name and ( and in argument -. . . . . . pp017.F KLM rescan -. . . . . . pp018.F KLM rescan with #undef (so rescan is after expansion) -. . . . . . pp019.F FLM rescan -. . . . . . pp020.F FLM expansion of argument -. . . . . . pp021.F KWM NOT expanded in 'literal' -. . . . . . pp022.F KWM NOT expanded in "literal" -. . E E . E pp023.F KWM NOT expanded in 9HHOLLERITH literal -. . . E . . pp024.F KWM NOT expanded in Hollerith in FORMAT -. . . . . . pp025.F KWM expansion is before token pasting due to fixed-form space removal -. . . E . E pp026.F ## token pasting works in FLM -E . . E E . pp027.F #DEFINE works in fixed form -. N . N N . pp028.F fixed-form clipping done before KWM expansion on source line -. . . . . . pp029.F \ newline allowed in #define -. . . . . . pp030.F /* C comment */ erased from #define -E E E E E E pp031.F // C++ comment NOT erased from #define -. . . . . . pp032.F /* C comment */ \ newline erased from #define -. . . . . . pp033.F /* C comment \ newline */ erased from #define -. . . . . N pp034.F \ newline allowed in name on KWM definition -. E . E E . pp035.F #if 2 .LT. 3 works -. . . . . . pp036.F #define FALSE TRUE ... .FALSE. -> .TRUE. -N N N N N N pp037.F fixed-form clipping NOT applied to #define -. . E . E E pp038.F FLM call with closing ')' on next line (not a continuation) -E . E . E E pp039.F FLM call with '(' on next line (not a continuation) -. . . . . . pp040.F #define KWM c, then KWM works as comment line initiator -E . E . . E pp041.F use KWM expansion as continuation indicators -N N N . . N pp042.F #define c 1, then use c as label in fixed-form -. . . . N . pp043.F #define with # in column 6 is a continuation line in fixed-form -E . . . . . pp044.F #define directive amid continuations -. . . . . . pp101.F90 keyword macros -. . . . . . pp102.F90 #undef -. . . . . . pp103.F90 function-like macros -. . . . . . pp104.F90 KWMs case-sensitive -. N N N N N pp105.F90 KWM call name split across continuation, with leading & -. N N N N N pp106.F90 ditto, with & ! comment -N N E E N . pp107.F90 KWM call name split across continuation, no leading &, with & ! comment -N N E E N . pp108.F90 ditto, but without & ! comment -. N N N N N pp109.F90 FLM call name split with leading & -. N N N N N pp110.F90 ditto, with & ! comment -N N E E N . pp111.F90 FLM call name split across continuation, no leading &, with & ! comment -N N E E N . pp112.F90 ditto, but without & ! comment -. N N N N E pp113.F90 FLM call split across continuation between name and (, leading & -. N N N N E pp114.F90 ditto, with & ! comment, leading & -N N N N N . pp115.F90 ditto, with & ! comment, no leading & -N N N N N . pp116.F90 FLM call split between name and (, no leading & -. . . . . . pp117.F90 KWM rescan -. . . . . . pp118.F90 KWM rescan with #undef, proving rescan after expansion -. . . . . . pp119.F90 FLM rescan -. . . . . . pp120.F90 FLM expansion of argument -. . . . . . pp121.F90 KWM NOT expanded in 'literal' -. . . . . . pp122.F90 KWM NOT expanded in "literal" -. . E E . E pp123.F90 KWM NOT expanded in Hollerith literal -. . E E . E pp124.F90 KWM NOT expanded in Hollerith in FORMAT -E . . E E . pp125.F90 #DEFINE works in free form -. . . . . . pp126.F90 \ newline works in #define -N . E . E E pp127.F90 FLM call with closing ')' on next line (not a continuation) -E . E . E E pp128.F90 FLM call with '(' on next line (not a continuation) -. . N . . N pp129.F90 #define KWM !, then KWM works as comment line initiator -E . E . . E pp130.F90 #define KWM &, use for continuation w/o pasting (ifort and nag seem to continue #define) -``` diff --git a/flang/documentation/PullRequestChecklist.md b/flang/documentation/PullRequestChecklist.md deleted file mode 100644 --- a/flang/documentation/PullRequestChecklist.md +++ /dev/null @@ -1,47 +0,0 @@ - - -# Pull request checklist -Please review the following items before submitting a pull request. This list -can also be used when reviewing pull requests. -* Verify that new files have a license with correct file name. -* Run `git diff` on all modified files to look for spurious changes such as - `#include `. -* If you added code that causes the compiler to emit a new error message, make - sure that you also added a test that causes that error message to appear - and verifies its correctness. -* Annotate the code and tests with appropriate references to constraint and - requirement numbers from the Fortran standard. Do not include the text of - the constraint or requirement, just its number. -* Alphabetize arbitrary lists of names. -* Check dereferences of pointers and optionals where necessary. -* Ensure that the scopes of all functions and variables are as local as - possible. -* Try to make all functions fit on a screen (40 lines). -* Build and test with both GNU and clang compilers. -* When submitting an update to a pull request, review previous pull request - comments and make sure that you've actually made all of the changes that - were requested. - -## Follow the style guide -The following items are taken from the [C++ style guide](C++style.md). But -even though I've read the style guide, they regularly trip me up. -* Run clang-format using the git-clang-format script from LLVM HEAD. -* Make sure that all source lines have 80 or fewer characters. Note that - clang-format will do this for most code. But you may need to break up long - strings. -* Review declarations for proper use of `constexpr` and `const`. -* Follow the C++ [naming guidelines](C++style.md#naming). -* Ensure that the names evoke their purpose and are consistent with existing code. -* Used braced initializers. -* Review pointer and reference types to make sure that you're using them - appropriately. Note that the [C++ style guide](C++style.md) contains a - section that describes all of the pointer types along with their - characteristics. -* Declare non-member functions ```static``` when possible. Prefer - ```static``` functions over functions in anonymous namespaces. diff --git a/flang/documentation/Semantics.md b/flang/documentation/Semantics.md deleted file mode 100644 --- a/flang/documentation/Semantics.md +++ /dev/null @@ -1,156 +0,0 @@ - - -# Semantic Analysis - -The semantic analysis pass determines if a syntactically correct Fortran -program is is legal by enforcing the constraints of the language. - -The input is a parse tree with a `Program` node at the root; -and a "cooked" character stream, a contiguous stream of characters -containing a normalized form of the Fortran source. - -The semantic analysis pass takes a parse tree for a syntactically -correct Fortran program and determines whether it is legal by enforcing -the constraints of the language. - -If the program is not legal, the results of the semantic pass will be a list of -errors associated with the program. - -If the program is legal, the semantic pass will produce a (possibly modified) -parse tree for the semantically correct program with each name mapped to a symbol -and each expression fully analyzed. - -All user errors are detected either prior to or during semantic analysis. -After it completes successfully the program should compile with no error messages. -There may still be warnings or informational messages. - -## Phases of Semantic Analysis - -1. [Validate labels](#validate-labels) - - Check all constraints on labels and branches -2. [Rewrite DO loops](#rewrite-do-loops) - - Convert all occurrences of `LabelDoStmt` to `DoConstruct`. -3. [Name resolution](#name-resolution) - - Analyze names and declarations, build a tree of Scopes containing Symbols, - and fill in the `Name::symbol` data member in the parse tree -4. [Rewrite parse tree](#rewrite-parse-tree) - - Fix incorrect parses based on symbol information -5. [Expression analysis](#expression-analysis) - - Analyze all expressions in the parse tree and fill in `Expr::typedExpr` and - `Variable::typedExpr` with analyzed expressions; fix incorrect parses - based on the result of this analysis -6. [Statement semantics](#statement-semantics) - - Perform remaining semantic checks on the execution parts of subprograms -7. [Write module files](#write-module-files) - - If no errors have occurred, write out `.mod` files for modules and submodules - -If phase 1 or phase 2 encounter an error on any of the program units, -compilation terminates. Otherwise, phases 3-6 are all performed even if -errors occur. -Module files are written (phase 7) only if there are no errors. - -### Validate labels - -Perform semantic checks related to labels and branches: -- check that any labels that are referenced are defined and in scope -- check branches into loop bodies -- check that labeled `DO` loops are properly nested -- check labels in data transfer statements - -### Rewrite DO loops - -This phase normalizes the parse tree by removing all unstructured `DO` loops -and replacing them with `DO` constructs. - -### Name resolution - -The name resolution phase walks the parse tree and constructs the symbol table. - -The symbol table consists of a tree of `Scope` objects rooted at the global scope. -The global scope is owned by the `SemanticsContext` object. -It contains a `Scope` for each program unit in the compilation. - -Each `Scope` in the scope tree contains child scopes representing other scopes -lexically nested in it. -Each `Scope` also contains a map of `CharBlock` to `Symbol` representing names -declared in that scope. (All names in the symbol table are represented as -`CharBlock` objects, i.e. as substrings of the cooked character stream.) - -All `Symbol` objects are owned by the symbol table data structures. -They should be accessed as `Symbol *` or `Symbol &` outside of the symbol -table classes as they can't be created, copied, or moved. -The `Symbol` class has functions and data common across all symbols, and a -`details` field that contains more information specific to that type of symbol. -Many symbols also have types, represented by `DeclTypeSpec`. -Types are also owned by scopes. - -Name resolution happens on the parse tree in this order: -1. Process the specification of a program unit: - 1. Create a new scope for the unit - 2. Create a symbol for each contained subprogram containing just the name - 3. Process the opening statement of the unit (`ModuleStmt`, `FunctionStmt`, etc.) - 4. Process the specification part of the unit -2. Apply the same process recursively to nested subprograms -3. Process the execution part of the program unit -4. Process the execution parts of nested subprograms recursively - -After the completion of this phase, every `Name` corresponds to a `Symbol` -unless an error occurred. - -### Rewrite parse tree - -The parser cannot build a completely correct parse tree without symbol information. -This phase corrects mis-parses based on symbols: -- Array element assignments may be parsed as statement functions: `a(i) = ...` -- Namelist group names without `NML=` may be parsed as format expressions -- A file unit number expression may be parsed as a character variable - -This phase also produces an internal error if it finds a `Name` that does not -have its `symbol` data member filled in. This error is suppressed if other -errors have occurred because in that case a `Name` corresponding to an erroneous -symbol may not be resolved. - -### Expression analysis - -Expressions that occur in the specification part are analyzed during name -resolution, for example, initial values, array bounds, type parameters. -Any remaining expressions are analyzed in this phase. - -For each `Variable` and top-level `Expr` (i.e. one that is not nested below -another `Expr` in the parse tree) the analyzed form of the expression is saved -in the `typedExpr` data member. After this phase has completed, the analyzed -expression can be accessed using `semantics::GetExpr()`. - -This phase also corrects mis-parses based on the result of expression analysis: -- An expression like `a(b)` is parsed as a function reference but may need - to be rewritten to an array element reference (if `a` is an object entity) - or to a structure constructor (if `a` is a derive type) -- An expression like `a(b:c)` is parsed as an array section but may need to be - rewritten as a substring if `a` is an object with type CHARACTER - -### Statement semantics - -Multiple independent checkers driven by the `SemanticsVisitor` framework -perform the remaining semantic checks. -By this phase, all names and expressions that can be successfully resolved -have been. But there may be names without symbols or expressions without -analyzed form if errors occurred earlier. - -### Write module files - -Separate compilation information is written out on successful compilation -of modules and submodules. These are used as input to name resolution -in program units that `USE` the modules. - -Module files are stripped down Fortran source for the module. -Parts that aren't needed to compile dependent program units (e.g. action statements) -are omitted. - -The module file for module `m` is named `m.mod` and the module file for -submodule `s` of module `m` is named `m-s.mod`.