diff --git a/flang/docs/F202X.md b/flang/docs/F202X.md new file mode 100644 --- /dev/null +++ b/flang/docs/F202X.md @@ -0,0 +1,355 @@ + + +# A first take on Fortran 202X features for LLVM Flang + +I (Peter Klausler) have been studying the draft PDF of the +[Fortran 202X standard](https://j3-fortran.org/doc/year/23/23-007r1.pdf), +which will soon be published as ISO Fortran 2023. +I have compiled this summary of its changes relative to +the current Fortran 2018 standard from the perspective +of a [Fortran compiler](https://github.com/llvm/llvm-project/tree/main/flang) +implementor. + +## TL;DR + +Fortran 202X doesn't make very many changes to the language +relative to Fortran 2018, which was itself a small increment +over Fortran 2008. +Apart from `REDUCE` clauses that were added to the +[still broken](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md) +`DO CONCURRENT` construct, there's little here for Fortran users +to get excited about. + +## Priority of implementation in LLVM Flang + +We are working hard to ensure that existing working applications will +port successfully to LLVM Flang with minimal effort. +I am not particularly concerned with conforming to a new +standard as an end in itself. + +The only features below that appear to have already been implemented +in other compilers are the `REDUCE` clauses and the degree trigonometric +intrinsic functions, so those should have priority as an aid to +portability. +We would want to support them earlier even if they were not in a standard. + +The `REDUCE` clause also merits early implementation due to +its potential for performance improvements in real codes. +I don't see any other feature here that would be relevant to +performance (maybe a weak argument could be made for `SIMPLE`). +The bulk of this revision unfortunately comprises changes to Fortran that +are neither performance-related, already available in +some compilers, nor (obviously) in use in existing codes. +I will not prioritize implementing them myself over +other work until they become portability concerns or are +requested by actual users. + +Given Fortran's history of the latency between new +standards and the support for their features in real compilers, +and then the extra lag before the features are then actually used +in codes meant to be portable, I doubt that many of the items +below will have to be worked on any time soon due to user demand. + +If J3 had chosen to add more features that were material improvements +to Fortran -- and there's quite a long list of worthy candidates that +were passed over, like read-only pointers -- it would have made sense +for me to prioritize their implementation in LLVM Flang more +urgently. + +## Specific change descriptions + +The individual features added to the language are summarized +in what I see as their order of significance to Fortran users. + +### Alert: There's a breaking change! + +The Fortran committee used to abhor making breaking changes, +apart from fixes, so that conforming codes could be portable across +time as well as across compilers. +Fortran 202X, however, uncharacteristically perpetrates one such +change to existing semantics that will silently cause existing +codes to work differently, if that change were to be implemented +and enabled by default. + +Specifically, automatic reallocation of whole deferred-length character +allocatable scalars is now mandated when they appear for internal output +(e.g., `WRITE(A,*) ...`) +or as output arguments for some statements and intrinsic procedures +(e.g., `IOMSG=`, `ERRMSG=`). +So existing codes that allocate output buffers +for such things will, or would, now observe that their buffers are +silently changing their lengths during execution, rather than being +padded with blanks or being truncated. For example: + +``` + character(:), allocatable :: buffer + allocate(character(20)::buffer) + write(buffer,'F5.3') 3.14159 + print *, len(buffer) +``` + +prints 20 with Fortran 2018 but would print 5 with Fortran 202X. + +There would have no problem with the new standard changing the +behavior in the current error case of an unallocated variable; +defining new semantics for old errors is a generally safe means +for extending a programming language. +However, in this case, we'll need to protect existing conforming +codes from the surprising new reallocation semantics, which +affect cases that are not errors. + +When/if there are requests from real users to implement this breaking +change, and if it is implemented, I'll have to ensure that users +have the ability to control this change in behavior via an option &/or the +runtime environment, and when it's enabled, emit a warning at code +sites that are at risk. +This warning should mention a source change they can make to protect +themselves from this change by passing the complete substring (`A(:)`) +instead of a whole character allocatable. + +This feature reminds me of Fortran 2003's change to whole +allocatable array assignment, although in that case users were +put at risk only of extra runtime overhead that was needless in +existing codes, not a change in behavior, and users learned to +assign to whole array sections (`A(:)=...`) rather than to whole +allocatable arrays where the performance hit mattered. + +### Major Items + +The features in this section are expensive to implement in +terms of engineering time to design, code, refactor, and test +(i.e., weeks or months, not days). + +#### `DO CONCURRENT REDUCE` + +J3 continues to ignore the +[serious semantic problems](https://github.com/llvm/llvm-project/blob/main/flang/docs/DoConcurrent.md) +with `DO CONCURRENT`, despite the simplicity of the necessary fix and their +admirable willingness to repair the standard to fix problems with +other features (e.g., plugging holes in `PURE` procedure requirements) +and their less admirable willingness to make breaking changes (see above). +They did add `REDUCE` clauses to `DO CONCURRENT`, and those seem to be +immediately useful to HPC codes and worth implementing soon. + +#### `SIMPLE` procedures + +The new `SIMPLE` procedures constitute a subset of F'95/HPF's `PURE` +procedures. +There are things that one can do in a `PURE` procedure +but cannot in a `SIMPLE` one. But the virtue of being `SIMPLE` seems +to be its own reward, not a requirement to access any other +feature. + +`SIMPLE` procedures might have been more useful had `DO CONCURRENT` been +changed to require callees to be `SIMPLE`, not just `PURE`. + +The implementation of `SIMPLE` will be nontrivial: it involves +some parsing and symbol table work, and some generalization of the +predicate function `IsPureProcedure()`, extending the semantic checking on +calls in `PURE` procedures to ensure that `SIMPLE` procedures +only call other `SIMPLE` procedures, and modifying the intrinsic +procedure table to note that most intrinsics are now `SIMPLE` +rather than just `PURE`. + +I don't expect any codes to rush to change their `PURE` procedures +to be `SIMPLE`, since it buys little and reduces portability. +This makes `SIMPLE` a lower-priority feature. + +#### Conditional expressions and actual arguments + +Next on the list of "big ticket" items are C-style conditional +expressions. These come in two forms, each of which is a distinct +feature that would be nontrivial to implement, and I would not be +surprised to see some compilers implement one before the other. + +The first form is a new parenthesized expression primary that any C programmer +would recognize. It has straightforward parsing and semantics, +but will require support in folding and all other code that +processes expressions. Lowering will be nontrivial due to +control flow. + +The second form is a conditional actual argument syntax +that allows runtime selection of argument associations, as well +as a `.NIL.` syntax for optional arguments to signify an absent actual +argument. This would have been more useful if it had also been +allowed as a pointer assignment statement right-hand side, and +that might be a worthwhile extension. As this form is essentially +a conditional variable reference it may be cleaner to have a +distinct representation from the conditional expression primary +in the parse tree and strongly-typed `Expr` representations. + +#### `ENUMERATION TYPE` + +Fortran 202X has a new category of type. The new non-interoperable +`ENUMERATION TYPE` feature is like C++'s `enum class` -- not, unfortunately, +a powerful sum data type as in Haskell or Rust. Unlike the +current `ENUM, BIND(C)` feature, `ENUMERATION TYPE` defines a new +type name and its distinct values. + +This feature may well be the item requiring the largest patch to +the compiler for its implementation, as it affects parsing, +type checking on assignment and argument association, generic +resolution, formatted I/O, NAMELIST, debugging symbols, &c. +It will indirectly affect every switch statement in the compiler +that switches over the six (now seven) type categories. +This will be a big project for little useful return to users. + +#### `TYPEOF` and `CLASSOF` + +Last on the list of "big ticket" items are the new TYPEOF and CLASSOF +type specifiers, which allow declarations to indirectly use the +types of previously-defined entities. These would have obvious utility +in a language with type polymorphism but aren't going to be very +useful yet in Fortran 202X (esp. `TYPEOF`), although they would be worth +supporting as a utility feature for a parametric module extension. + +`CLASSOF` has implications for semantics and lowering that need to +be thought through as it seems to provide a means of +declaring polymorphic local variables and function results that are +neither allocatables nor pointers. + +#### Coarray extensions: + + * `NOTIFY_TYPE`, `NOTIFY WAIT` statement, `NOTIFY=` specifier on image selector + * Arrays with coarray components + +#### "Rank Independent" Features + +The `RANK(n)` attribute declaration syntax is equivalent to +`DIMENSION(:,:,...,:)` or an equivalent entity-decl containing `n` colons. +As `n` must be a constant expression, that's straightforward to implement, +though not terribly useful until the language acquires additional features. +(I can see some utility in being able to declare PDT components with a +`RANK` that depends on a `KIND` type parameter.) + +It is now possible to declare the lower and upper bounds of an explicit +shape entity using a constant-length vector specification expression +in a declaration, `ALLOCATE` statement, or pointer assignment with +bounds remapping. +For example, `real A([2,3])` is equivalent to `real A(2,3)`. + +The new `A(@V)` "multiple subscript" indexing syntax uses an integer +vector to supply a list of subscripts or of triplet bounds/strides. This one +has tough edge cases for lowering that need to be thought through; +for example, when the lengths of two or more of the vectors in +`A(@U,@V,@W)` are not known at compilation time, implementing the indexing +would be tricky in generated code and might just end up filling a +temporary with `[U,V,W]` first. + +The obvious use case for "multiple subscripts" would be as a means to +index into an assumed-rank dummy argument without the bother of a `SELECT RANK` +construct, but that usage is not supported in Fortran 202X. + +This feature may well turn out to be Fortran 202X's analog to Fortran 2003's +`LEN` derived type parameters. + +### Minor Items + +So much for the major features of Fortran 202X. The longer list +of minor features can be more briefly summarized. + +#### New Edit Descriptors + +Fortran 202X has some noncontroversial small tweaks to formatted output. +The `AT` edit descriptor automatically trims character output. The `LZP`, +`LZS`, and `LZ` control edit descriptors and `LEADING_ZERO=` specifier provide a +means for controlling the output of leading zero digits. + +#### Intrinsic Module Extensions + +Addressing some issues and omissions in intrinsic modules: + + * LOGICAL8/16/32/64 and REAL16 + * IEEE module facilities upgraded to match latest IEEE FP standard + * C_F_STRPOINTER, F_C_STRING for NUL-terminated strings + * C_F_POINTER(LOWER=) + +#### Intrinsic Procedure Extensions + +The `SYSTEM_CLOCK` intrinsic function got some semantic tweaks. + +There are new intrinsic functions for trigonometric functions in +units of degrees and half-circles. +GNU Fortran already supports the forms that use degree units. +These should call into math library implementations that are +specialized for those units rather than simply multiplying +arguments or results with conversion factors. + * `ACOSD`, `ASIND`, `ATAND`, `ATAN2D`, `COSD`, `SIND`, `TAND` + * `ACOSPI`, `ASINPI`, `ATANPI`, `ATAN2PI`, `COSPI`, `SINPI`, `TANPI` + +`SELECTED_LOGICAL_KIND` maps a bit size to a kind of `LOGICAL` + +There are two new character utility intrinsic +functions whose implementations have very low priority: `SPLIT` and `TOKENIZE`. +`TOKENIZE` requires memory allocation to return its results, +and could and should have been implemented once in some Fortran utility +library for those who need a slow tokenization facility rather than +requiring implementations in each vendor's runtime support library with +all the extra cost and compatibilty risk that entails. + +`SPLIT` is worse -- not only could it, like `TOKENIZE`, +have been supplied by a Fortran utility library rather than being +added to the standard, it's redundant; +it provides nothing that cannot be already accomplished by +composing today's `SCAN` intrinsic function with substring indexing: + +``` +module m + interface split + module procedure :: split + end interface + !instantiate for all possible ck/ik/lk combinations + integer, parameter :: ck = kind(''), ik = kind(0), lk = kind(.true.) + contains + simple elemental subroutine split(string, set, pos, back) + character(*, kind=ck), intent(in) :: string, set + integer(kind=ik), intent(in out) :: pos + logical(kind=lk), intent(in), optional :: back + if (present(back)) then + if (back) then + pos = scan(string(:pos-1), set, .true.) + return + end if + end if + npos = scan(string(pos+1:), set) + pos = merge(pos + npos, len(string) + 1, npos /= 0) + end +end +``` + +(The code above isn't a proposed implementation for `SPLIT`, just a +demonstration of how programs could use `SCAN` to accomplish the same +results today.) + +## Source limitations + +Fortran 202X raises the maximum number of characters per free form +source line and the maximum total number of characters per statement. +Both of these have always been unlimited in this compiler (or +limited only by available memory, to be more accurate.) + +## More BOZ usage opportunities + +BOZ literal constants (binary, octal, and hexadecimal constants, +also known as "typeless" values) have more conforming usage in the +new standard in contexts where the type is unambiguously known. +They may now appear as initializers, as right-hand sides of intrinsic +assignments to integer and real variables, in explicitly typed +array constructors, and in the definitions of enumerations. + +## Citation updates + +The source base contains hundreds of references to the subclauses, +requirements, and constraints of the Fortran 2018 standard, mostly in code comments. +These will need to be mapped to their Fortran 202X counterparts once the +new standard is published, as the Fortran committee does not provide a +means for citing these items by names that are fixed over time like the +C++ committee does. +If we had access to the LaTeX sources of the standard, we could generate +a mapping table and automate this update.