This is a series of patches that adds a new pragma for loop transformations. I hope for feedback before the LLVM DevMtg where this will be the topic of my talk. The talk will give an overview about how to add such an extension that touches all of clang's layers and would hate to give wrong advice.
The syntax is:
#pragma clang transform distribute #pragma clang transform unroll/unrollandjam [full/partial(n)] #pragma clang transform vectorize [width(n)] #pragma clang transform interleave [factor(n)]
The selection is currently limited to the passes LLVM currently supports. I am working on more transformations that currently are only picked-up by Polly. The biggest difference to #pragma clang loop it allows to specify in which order the transformations are applied, which is ignored by clang's current LoopHint attribute. It is also designed a bit more carefully, e.g. vectorize and interleave are unambiguously different transformations and no question whether setting an optimization option also enables the transformations.
In the longer term, we plan to add more features such as:
- More transformations (tiling, fusion, interchange, array packing, reversal, wavefronting, peeling, splitting, space-filling curves, unswitching, collapsing, strip/strip-mining, blocking, )
- More options
- Assigning identifiers to code such that they can be referenced by transformations. (e.g. tile a loop nest, vectorize the second-to-innermost loop and parallelize the outermost).
- Non-loop transformations (code motion, versioning, ...)
- OpenMP compatibility
Regarding the latter item, we are adding loop transformation to the OpenMP specification. The next technical report presented at SC'19 will feature a tiling transformation. As such, this patch is inspired by clang's OpenMP implementation to make an integration later easier. It's not OpenMP though, in that for instance the OpenMP construct will apply tiling regardless of semantic equivalence while #pragma clang transform takes the classical compiler-hint approach in that it (by default) still does a correctness check, only
influencing the profitability heuristic.
A previous prototype that was closer to how #pragma clang loop is implemented using attributes instead of adding an additional kind of AST nodes. This showed its limitations in that it did not allow all use-cases (such as #pragma without a following statement) and its argument format can only store an array of in-source identifiers and expressions. The prototype also used the '#pragma clang loop syntax, but it proved difficult to disambiguate whether the transformations are ordered or not.
The patch is split into multiple reviews:
- [this patch] The lexer part adds annotation begin- and end-tokens to the token stream, as OpenMP does.
- D69089: The parser part parses the tokens between the annotation tokens and calls ActOn... methods of Sema which are empty in this patch. The subclasses of Transform represent the transformation to apply (e.g. "unroll by a factor of 4") and its properties ("consumes 1 loop and emits one main loop and a remainder").
- D69091: The sema part also adds the AST nodes kinds: the Stmt representing the #pragma (TransformExecutableDirective) and the clauses (TransformClause). Moreover, the AnalysisTransform component constructs a loop nest tree to which transformations are applied to such that Sema can warn about inconsistencies, e.g. there is no inner or ambiguous loops for unrollandjam.
- D69092: The codegen part uses the same AnalysisTransform to determine which loop metadata nodes to emit.
- D70572: (De-)serialization of TransformExecutableDirective and its clauses for modules and precompiled headers.
- D71447: CIndex for libclang AST traversal
- D70032: Documentation update
- Optional parts not yet ready such as completion, ASTMatcher and tooling
Thanks in advance for the review!