This is an archive of the discontinued LLVM Phabricator instance.

[Lex] #pragma clang transform
Needs ReviewPublic

Authored by Meinersbur on Oct 17 2019, 12:21 AM.

Details

Summary

This is a series of patches that adds a new pragma for loop transformations. I hope for feedback before the LLVM DevMtg where this will be the topic of my talk. The talk will give an overview about how to add such an extension that touches all of clang's layers and would hate to give wrong advice.

The syntax is:

#pragma clang transform distribute
#pragma clang transform unroll/unrollandjam [full/partial(n)]
#pragma clang transform vectorize [width(n)]
#pragma clang transform interleave [factor(n)]

The selection is currently limited to the passes LLVM currently supports. I am working on more transformations that currently are only picked-up by Polly. The biggest difference to #pragma clang loop it allows to specify in which order the transformations are applied, which is ignored by clang's current LoopHint attribute. It is also designed a bit more carefully, e.g. vectorize and interleave are unambiguously different transformations and no question whether setting an optimization option also enables the transformations.

In the longer term, we plan to add more features such as:

  • More transformations (tiling, fusion, interchange, array packing, reversal, wavefronting, peeling, splitting, space-filling curves, unswitching, collapsing, strip/strip-mining, blocking, )
  • More options
  • Assigning identifiers to code such that they can be referenced by transformations. (e.g. tile a loop nest, vectorize the second-to-innermost loop and parallelize the outermost).
  • Non-loop transformations (code motion, versioning, ...)
  • OpenMP compatibility

Regarding the latter item, we are adding loop transformation to the OpenMP specification. The next technical report presented at SC'19 will feature a tiling transformation. As such, this patch is inspired by clang's OpenMP implementation to make an integration later easier. It's not OpenMP though, in that for instance the OpenMP construct will apply tiling regardless of semantic equivalence while #pragma clang transform takes the classical compiler-hint approach in that it (by default) still does a correctness check, only
influencing the profitability heuristic.

A previous prototype that was closer to how #pragma clang loop is implemented using attributes instead of adding an additional kind of AST nodes. This showed its limitations in that it did not allow all use-cases (such as #pragma without a following statement) and its argument format can only store an array of in-source identifiers and expressions. The prototype also used the '#pragma clang loop syntax, but it proved difficult to disambiguate whether the transformations are ordered or not.

The patch is split into multiple reviews:

  • [this patch] The lexer part adds annotation begin- and end-tokens to the token stream, as OpenMP does.
  • D69089: The parser part parses the tokens between the annotation tokens and calls ActOn... methods of Sema which are empty in this patch. The subclasses of Transform represent the transformation to apply (e.g. "unroll by a factor of 4") and its properties ("consumes 1 loop and emits one main loop and a remainder").
  • D69091: The sema part also adds the AST nodes kinds: the Stmt representing the #pragma (TransformExecutableDirective) and the clauses (TransformClause). Moreover, the AnalysisTransform component constructs a loop nest tree to which transformations are applied to such that Sema can warn about inconsistencies, e.g. there is no inner or ambiguous loops for unrollandjam.
  • D69092: The codegen part uses the same AnalysisTransform to determine which loop metadata nodes to emit.
  • D70572: (De-)serialization of TransformExecutableDirective and its clauses for modules and precompiled headers.
  • D71447: CIndex for libclang AST traversal
  • D70032: Documentation update
  • Optional parts not yet ready such as completion, ASTMatcher and tooling

Thanks in advance for the review!

Event Timeline

Meinersbur created this revision.Oct 17 2019, 12:21 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: cfe-commits. · View Herald Transcript

Why not try to improve the existing #pragma clang loop rather than add a new pragma with almost the same behavior?

@Meinersbur, if I remember correctly, there was an RFC discussion on this topic, right? If yes, would you post the pointer to that? I need a refresher on what has been discussed/settled in the past.

Meinersbur added a comment.EditedOct 17 2019, 2:26 PM

Why not try to improve the existing #pragma clang loop rather than add a new pragma with almost the same behavior?

The behavior and syntax is different. #pragma clang loop ignores the order, i.e.

#pragma clang loop unroll(enable)
#pragma clang loop distribute(enable)

and

#pragma clang loop distribute(enable)
#pragma clang loop unroll(enable)

and

#pragma clang loop unroll(enable) distribute(enable)

are the same. Changing that would be a breaking change.

Syntactically, every option is it's own transformation, e.g.

#pragma clang loop unroll(enable) distribute(enable) unroll_count(2)

could be interpreted as 3 transformations (LoopUnroll even exists twice in the pass pipeline). I prefer OpenMP's directive-with-clauses syntax, which we need to implement anyway for the OpenMP loop transformations.

In the future, I would also like to add non-loop transformation, making the loop namespace unfavorable.

@Meinersbur, if I remember correctly, there was an RFC discussion on this topic, right? If yes, would you post the pointer to that? I need a refresher on what has been discussed/settled in the past.

https://lists.llvm.org/pipermail/cfe-dev/2018-May/058141.html

@Meinersbur, if I remember correctly, there was an RFC discussion on this topic, right? If yes, would you post the pointer to that? I need a refresher on what has been discussed/settled in the past.

https://lists.llvm.org/pipermail/cfe-dev/2018-May/058141.html

Sorry if this is answered in the patches but what happens if a loop has both #pragma clang loop and transform defined before it? I guess it probably shouldn't work.

Perhaps instead you could create a new option to indicate that the order should be respected.

#pragma clang loop respect_order <- optionally with (true) or (false)

That approach would avoid the inevitable conflicts of having both loop and transform pragmas on the same loop.

(Sorry if you received this twice)

@Meinersbur, if I remember correctly, there was an RFC discussion on this topic, right? If yes, would you post the pointer to that? I need a refresher on what has been discussed/settled in the past.

https://lists.llvm.org/pipermail/cfe-dev/2018-May/058141.html

Sorry if this is answered in the patches but what happens if a loop has both #pragma clang loop and transform defined before it? I guess it probably shouldn't work.

Yes, the plan was to make it an hard error. I unfortunately forgot to add that to D69091, thanks for reminding me. In the current implementation the #pragma clang loop would be applied first, but it's not intentional.

Perhaps instead you could create a new option to indicate that the order should be respected.

#pragma clang loop respect_order <- optionally with (true) or (false)

That approach would avoid the inevitable conflicts of having both loop and transform pragmas on the same loop.

There is also a syntax difference:

#pragma clang loop vectorize_width(4)

compared to

#pragma clang transform vectorize width(4)

which in a similar form is also how it is done in OpenMP:

#pragma omp simd simdlen(4)

IMHO, a respect_order option is problematic, not only because it influences parsing while being parsed. It also makes the preferable behavior clunkier to use.

That approach would avoid the inevitable conflicts of having both loop and transform pragmas on the same loop.

I fear it will give us far worse ambiguities. Consider:

#pragma clang loop unroll(enable) respect_order unrollandjam(enable) unroll_count(4)

How often does it unroll?

That approach would avoid the inevitable conflicts of having both loop and transform pragmas on the same loop.

I fear it will give us far worse ambiguities. Consider:

#pragma clang loop unroll(enable) respect_order unrollandjam(enable) unroll_count(4)

How often does it unroll?

Just do not allow this form with respect_order clause.

Have we established general consensus for the desire to have the flexible enough loop optimization pass ordering to accomplish the outcome of the new directive, and shared vision for the path to get there? If we are making this a general clang directive, I'd like to see the vision to get there w/o depending on polly. If this is already discussed and settled, pointer to that is appreciated so that I can learn.

Thanks,
Hideki

Just do not allow this form with respect_order clause.

What exactly would be the rules what is allowed and what isn't?

We can just not allow not mixing #pragma clang loop and #pragma clang transform.

Have we established general consensus for the desire to have the flexible enough loop optimization pass ordering to accomplish the outcome of the new directive, and shared vision for the path to get there? If we are making this a general clang directive, I'd like to see the vision to get there w/o depending on polly. If this is already discussed and settled, pointer to that is appreciated so that I can learn.

Response to the RFCs was meager. However, I got positive feedback at various conferences, including last year's DevMtg where my version for loop transformations was a technical talk.

Have we established general consensus for the desire to have the flexible enough loop optimization pass ordering to accomplish the outcome of the new directive, and shared vision for the path to get there? If we are making this a general clang directive, I'd like to see the vision to get there w/o depending on polly. If this is already discussed and settled, pointer to that is appreciated so that I can learn.

Response to the RFCs was meager. However, I got positive feedback at various conferences, including last year's DevMtg where my version for loop transformations was a technical talk.

Personally, I like the intent. I don't foresee a clear (enough) path to get there. This leads to hesitation of adding a new non-experimental pragma and present it to programmers. If you call it experimental, it's easier for me to swallow.

Personally, I like the intent. I don't foresee a clear (enough) path to get there. This leads to hesitation of adding a new non-experimental pragma and present it to programmers. If you call it experimental, it's easier for me to swallow.

Is there anything more to do than mentioning as being it experimental in the (no-patch-available-yet) docs?

Meinersbur added a comment.EditedOct 18 2019, 12:36 PM

@Meinersbur, if I remember correctly, there was an RFC discussion on this topic, right? If yes, would you post the pointer to that? I need a refresher on what has been discussed/settled in the past.

My publications on this topic would also be useful here, available on arXiv. Here is an overview, also including previous discussions:

Loop optimization directives:

Loop attributes metadata:

Applying loop optimizations:

Personally, I like the intent. I don't foresee a clear (enough) path to get there. This leads to hesitation of adding a new non-experimental pragma and present it to programmers. If you call it experimental, it's easier for me to swallow.

Is there anything more to do than mentioning as being it experimental in the (no-patch-available-yet) docs?

If there is a precedence, just follow that. Else, how to spell an experimental clang pragma would be a good discussion topic by itself. If I need to provide a discussion starter, I'd say how about transform_experimental instead of transform. All I ask is somehow make it easier for programmers to know it is experimental so that they won't use it w/o first reading about the current state of support. I don't have a strong opinion about how to do so.

If others with stakes in loop optimizations foresee a clear enough path to get there, I won't insist this being experimental, but I would like to understand the path.

Thanks,
Hideki

If there is a precedence, just follow that. Else, how to spell an experimental clang pragma would be a good discussion topic by itself. If I need to provide a discussion starter, I'd say how about transform_experimental instead of transform. All I ask is somehow make it easier for programmers to know it is experimental so that they won't use it w/o first reading about the current state of support. I don't have a strong opinion about how to do so.

The precedences I have found are -fexperimental-pass-manager, -fexperimental-isel, std::experimental and clang-cl /openmp:experimental, Modules and a couple of features only mentioning "experimental" in their commit log.

I dislike changing the syntax syntax as it means that we will at one point break already written code or have to maintain two spellings. I'd rather just enable them with a command-line switch, such as -fexperimental-transform.

I'd rather just enable them with a command-line switch, such as -fexperimental-transform.

This direction works for me. -fexperimental-transform-pragma might be better, though.

reames resigned from this revision.Oct 28 2019, 10:44 AM
Meinersbur updated this revision to Diff 227550.Nov 1 2019, 4:21 PM
  • Use PRAGMA_ANNOTATION
  • Monorepo layout
Meinersbur updated this revision to Diff 227561.Nov 1 2019, 7:41 PM
  • Implement -f(no-)experimental-transform-pragma
ABataev added inline comments.Nov 19 2019, 9:07 AM
clang/lib/Parse/ParsePragma.cpp
3062

Add a message in this assert.

3063–3067

These asserts are covered by the very first one assert(!Tok.isAnnotation());

This is a major new language feature, and code review is probably not the right venue for reviewing it; there should be a thread on cfe-dev. My apologies if that's already been discussed and I missed it.

Meinersbur edited the summary of this revision. (Show Details)Dec 13 2019, 3:26 PM

This is a major new language feature, and code review is probably not the right venue for reviewing it; there should be a thread on cfe-dev. My apologies if that's already been discussed and I missed it.

I agree. I know the original RFC didn't get many responses but perhaps discussion will pick up given that patches now exist. The discussion should happen on both cfe-dev and llvm-dev because it affects both.

In particular I'd like to understand the motivation for this. Is it for experimental purposes, to aid in compiler tuning, to allow the user to override compiler decisions and/or something else?

This is a major new language feature, and code review is probably not the right venue for reviewing it; there should be a thread on cfe-dev. My apologies if that's already been discussed and I missed it.

For some reason Phabricator did not send me an email when you submitted your comment such that I missed it. I only noticed it with @greened comment. My apologies.

I will write send an RFC to the mailing-list.

simoll added a subscriber: simoll.Apr 7 2020, 5:02 AM

@Meinersbur I missed the RFC and discussion on the cfe-dev mailing list. Could you post a link here so that it's included in the history?

I don't have any opposition to this, and it seems that you have addressed all the comments from reviewers. So, unless there was something that came up from the RFC discussion (which I doubt, since you just pinged the patch), I think this is good to land. However, I'm not really in a position to approve the patch since the implementation is well out of my domain of expertise.

@Meinersbur I missed the RFC and discussion on the cfe-dev mailing list. Could you post a link here so that it's included in the history?

See the collection of links in a previous comment: https://reviews.llvm.org/D69088#1715025

In particular, you seem to look for this: https://lists.llvm.org/pipermail/cfe-dev/2018-May/058141.html . However, there was no response to that RFC.