This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Do not parse certain characters in pragma directives
ClosedPublic

Authored by jhuber6 on Oct 17 2022, 11:36 AM.

Details

Summary

Currently, we parse lines inside of a compiler #pragma the same way we
parse any other line. This is fine for some cases, like separating
expressions and adding proper spacing, but in others it causes some poor
results from miscategorizing some tokens.

For example, the OpenMP offloading uses certain clauses that contain
special characters like map(tofrom : A[0:N]). This will be formatted
poorly as it will be split between lines on the first colon.
Additionally the subscript notation will lead to poor spacing. This can
be seen in the OpenMP tests as the automatic clang formatting with
inevitably ruin the formatting.

For example, the following contrived example will be formatted poorly.

#pragma omp target teams distribute collapse(2) map(to: A[0 : M * K])  \
    map(to: B[0:K * N]) map(tofrom:C[0:M*N]) firstprivate(Alpha) \
    firstprivate(Beta) firstprivate(X) firstprivate(D) firstprivate(Y) \
    firstprivate(E) firstprivate(Z) firstprivate(F)

This results in this when formatted, which is far from ideal.

#pragma omp target teams distribute collapse(2) map(to                         \
                                                    : A [0:M * K])             \
    map(to                                                                     \
        : B [0:K * N]) map(tofrom                                              \
                           : C [0:M * N]) firstprivate(Alpha)                  \
        firstprivate(Beta) firstprivate(X) firstprivate(D) firstprivate(Y)     \
            firstprivate(E) firstprivate(Z) firstprivate(F)

This patch seeks to improve this by adding extra logic where the parsing goes
awry. This is primarily caused by the colon being parsed as an inline-asm
directive and the brackes an objective-C expressions. Also the line gets
indented every single time the line is dropped.

This doesn't implement true parsing handling for OpenMP statements.

Diff Detail

Event Timeline

jhuber6 created this revision.Oct 17 2022, 11:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2022, 11:36 AM
jhuber6 requested review of this revision.Oct 17 2022, 11:36 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 17 2022, 11:36 AM

Pretty interesting, it looks ok from what I can tell, let the others take a look

MyDeveloperDay added a project: Restricted Project.Oct 17 2022, 2:58 PM

Pretty interesting, it looks ok from what I can tell, let the others take a look

Thanks, I was originally hoping I could avoid adding a new boolean for InPragma by asking something like Line.startswith(tok::pp_pragma) but that didn't seem to work.

This revision is now accepted and ready to land.Oct 18 2022, 1:45 PM
This revision was landed with ongoing or failed builds.Oct 18 2022, 2:38 PM
This revision was automatically updated to reflect the committed changes.
hans added a subscriber: hans.Dec 12 2022, 6:04 AM

Chromium is seeing a formatting regression after this: https://github.com/llvm/llvm-project/issues/59473

Chromium is seeing a formatting regression after this: https://github.com/llvm/llvm-project/issues/59473

My guess is from this line, we could be more specific on the type of pragma.

if (State.Line->InPragmaDirective)
  return CurrentState.Indent + Style.ContinuationIndentWidth;

Chromium is seeing a formatting regression after this: https://github.com/llvm/llvm-project/issues/59473

Can we get more specific about what Chromium is seeing?

thakis added a subscriber: thakis.Dec 12 2022, 7:01 AM

Can we get more specific about what Chromium is seeing?

https://github.com/llvm/llvm-project/issues/59473 has a standalone repro. What else would you like to see?

I think you need to parse the first paren as whole and do not indent it.

Should we revert this patch?

clang/unittests/Format/FormatTest.cpp
5175

Why was this test case changed? It seemed to be related to the regression mentioned in D136100#3988574.

I would prefer that this doesn't get reverted, see the summary for the awful results for OpenMP without this patch. A potential solution would be to parse the next token and only add the indent if it's omp.

clang/unittests/Format/FormatTest.cpp
5175

It's definitely related, we want some indentation for successive OpenMP clauses pushed to a new line, see below.