This is an archive of the discontinued LLVM Phabricator instance.

[clang] Add #pragma clang loop [no]prefetch() (part1)
Needs ReviewPublic

Authored by Flightor on Feb 20 2023, 4:06 AM.


Useage of pragma:
1. #pragma clang loop noprefetch(variable)
   1) `variable`:
      a memory reference(data to be prefetched), must be a declared
      pointer/array variable.

2. #pragma clang loop prefetch(variable, level, distance)
   1) `variable`:
      a memory reference(data to be prefetched), must be a declared
      pointer/array variable.
   2) `level`:
      an optional value to the compiler to specify the type of prefetch.
      '0': data will not be reused
      '1': L1 cache
      '2': L2 cache
      '3': L3 cache
      To use this argument, you must also specify `variable`.
   3) `distance`:
      an option integer argument with a value greater than 0. It indicates
      the number of loop iterations ahead of which a prefetch is issued,
      before the corresponding load or store instruction.
      To use this argument, you must also specify `variable` and `level`.

And add new medatada to support prefetch:

!llvm.loop.prefetch !0
!0 = distinct !{i1 true/false, i32 level, i32 distance}

1st arg: true: prefetch;  false: noprefetch
2nd arg: -1: default;     0-3: cache level
3rd arg: -1: default;     1-INT_MAX: iterations ahead

Diff Detail

Event Timeline

Flightor created this revision.Feb 20 2023, 4:06 AM
Herald added a project: Restricted Project. · View Herald Transcript
Herald added a subscriber: zzheng. · View Herald Transcript
Flightor requested review of this revision.Feb 20 2023, 4:06 AM

Can we also see the code that makes use of the new metadata?


The name and its comment is pretty undescriptive.


[style] If using auto, still use * and &.

Please also consider the LLVM coding style for your other code as well.


The metadata must be documented at, not just in a hard to find code comment.

The metadata itself seems not be connected to any loop, just to memory access instructions. Since instructions can be moved out of or into a loop, it might get associated with the wrong loop.


With this many tuple arguments, consider using a struct instead, documenting the meaning of its values.


ParseConstantExpression() can overwrite the Tok field, it must be checked whether it really is a comma before any further parsing.


Shouldn't a comma without expression result in an error?


Combining OpenMP with loop hints is unsupported, I don't think it needs to be tested.

Flightor added inline comments.Feb 22 2023, 4:50 PM

And some background info:

Yeah, it is not bound to the loop at present. You are right. I only deal with this situation in LICM. It is indeed best to add another metadata to bind to the loop id.

Flightor added inline comments.Feb 22 2023, 11:32 PM

Do you mean that the Tok before line 1458 must be a comma? If so, then the check is made on line 1451.


Because noprefetch only accepts one argument (different from prefetch), when the written argument is greater than the specified number of argument, the redundant information is automatically discarded. Referring to the design of other #pragma clang loop *, the alarm information is judged as warning.


For the message "Combining OpenMP with loop hints is unsupported", I have not found the description of LLVM. Where can I find this message? Thank you.

Flightor updated this revision to Diff 500030.Feb 23 2023, 6:26 PM
congzhe added a subscriber: congzhe.Mar 7 2023, 8:59 PM
congzhe added inline comments.

is ptr %0 supposed to be ptr %A?


The comment here seems to be a bit vague? It does not look clear what this value is -- essentially every value either "needs to be prefetched" or "does not need to be prefetched."

Would it be better to change it to something like The value that will be prefetched with "#pragma clang loop prefetch", or refrained from being prefetched with "#pragma clang loop noprefetch"?


Would it be better to use PrefetchHint* as elements in the vector instead of using PrefetchHint?


Would it be worthwhile adding test cases that have nested loops and with pragma only for the inner loop?

Meinersbur added inline comments.Mar 8 2023, 9:35 AM

I did not take into account that ParseConstantExpression() uses Tok as the first token. But it means that the comment // ',' is incorrect, it does not parse a comma, it parses the token after the comma and discards the comma that Tok was before. Consider using ConsumeToken or ExpectAndConsume instead.


This is probably an oversight, the message warning: extra tokens at end of '#pragma clang loop noprefetch' - ignored is at the very least misleeding:

#pragma clang loop noprefetch(a,) vectorize_enable(4)
for (int i = 0; i < n; ++i)

Is the vectorize_enable taken into account or not?

You also get it not just with additional argument, but anything following after parsing the constant expression:

#pragma clang loop unroll_count(4 4)
for (int i = 0; i < n; ++i)

It's not a message, it's just nobody has even considered it and hence undocumented. At the very least, the meaning is ambiguous. #pragma omp for may split the loop into two different loops, a "chunk"-loop and an outer loop over chunks and it is unclear to which it should even apply.

I tested it myself some time and Clang would just drop some metadata. I am surprised the !llvm.loop.prefetch is even there.

Thank for everyone's suggestions. I am redesigning the pragma and clarifying the application scenarios of the pragma.
More discusssion:
A new patch (D146122) to is used to support #pragma clang loop prefetch(disable) first.

iamlouk added a subscriber: iamlouk.Aug 9 2023, 8:18 AM