This is an archive of the discontinued LLVM Phabricator instance.

[libc++] Add format check to CI
ClosedPublic

Authored by curdeius on Nov 27 2020, 6:58 AM.

Details

Reviewers
ldionne
EricWF
Mordante
miscco
cjdb
Group Reviewers
Restricted Project
Commits
rG1361c5e7d703: [libc++] Add format check to CI
Summary

Note: contrary to what I said previously, I didn't change .clang-format nor utils/generate_feature_test_macro_components.py script.

Diff Detail

Event Timeline

curdeius created this revision.Nov 27 2020, 6:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 27 2020, 6:58 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
curdeius requested review of this revision.Nov 27 2020, 6:58 AM

FYI, https://clang.llvm.org/docs/ClangFormattedStatus.html shows the status of LLVM parts being clang-formatted.

I really like that we get clang-format to work. Live is simply to short to worry about formatting.

I assume we should simply make one monster commit adding that to the exception list for git blame and be done with it

I really like that we get clang-format to work. Live is simply to short to worry about formatting.

I assume we should simply make one monster commit adding that to the exception list for git blame and be done with it

That would still make it more difficult to blame stuff if lines move around a lot. Another option would be to mandate that we clang-format everything we touch from now on. Thus, as we make changes to the library, the lines we change (and some amount of surrounding context) would be re-formatted. I'm not sure which approach is best, since I haven't done it before. I'd be curious to hear stories of folks who moved existing codebases to Clang format.

ldionne added inline comments.Nov 27 2020, 7:26 AM
libcxx/utils/generate_feature_test_macro_components.py
684

Shouldn't this be # ifdef (3 spaces) to get the ifdef aligned at 4 columns?

That would be interesting, but I believe that moving /splitting of lines should be common enough that it is already sorted out

Has clang-format been improved enough that it will not "wreck" std::less<void>? Last time I tried it on that code, it pessimized it pretty thoroughly.
Code pasted here for reference:

#if _LIBCPP_STD_VER > 11
template <>
struct _LIBCPP_TEMPLATE_VIS less<void>
{
    template <class _T1, class _T2>
    _LIBCPP_CONSTEXPR_AFTER_CXX11 _LIBCPP_INLINE_VISIBILITY
    auto operator()(_T1&& __t, _T2&& __u) const
    _NOEXCEPT_(noexcept(_VSTD::forward<_T1>(__t) < _VSTD::forward<_T2>(__u)))
    -> decltype        (_VSTD::forward<_T1>(__t) < _VSTD::forward<_T2>(__u))
        { return        _VSTD::forward<_T1>(__t) < _VSTD::forward<_T2>(__u); }
    typedef void is_transparent;
};
#endif

Also, @missco wrote:

I really like that we get clang-format to work. Live is simply to short to worry about formatting.

Writing is once, reading is many, many times.
Formatting that reveals the underlying structure of the code is a time-saver to future you (and others).

Has clang-format been improved enough that it will not "wreck" std::less<void>? Last time I tried it on that code, it pessimized it pretty thoroughly.
Code pasted here for reference:

#if _LIBCPP_STD_VER > 11
template <>
struct _LIBCPP_TEMPLATE_VIS less<void>
{
    template <class _T1, class _T2>
    _LIBCPP_CONSTEXPR_AFTER_CXX11 _LIBCPP_INLINE_VISIBILITY
    auto operator()(_T1&& __t, _T2&& __u) const
    _NOEXCEPT_(noexcept(_VSTD::forward<_T1>(__t) < _VSTD::forward<_T2>(__u)))
    -> decltype        (_VSTD::forward<_T1>(__t) < _VSTD::forward<_T2>(__u))
        { return        _VSTD::forward<_T1>(__t) < _VSTD::forward<_T2>(__u); }
    typedef void is_transparent;
};
#endif

I would like to note that with MSVC STL we simply work around those rare cases via //clang-format off. One main use case are requires clauses that are not yet supported. That said, the vast majority of the code works fine.

Also, @missco wrote:

I really like that we get clang-format to work. Live is simply to short to worry about formatting.

Writing is once, reading is many, many times.
Formatting that reveals the underlying structure of the code is a time-saver to future you (and others).

That is true, but consistency in a code base is incredibly valuable for new contributors. Having to wonder how to indent / format complicated STL code is a really hard hurdle for a first time contributor

Also consistency helps reading a lot. and as said above the few rare cases where it is detrimental one can always simply use // clang-format off

I agree with @miscco, the advantages of consistent formatting outweigh the inconveniences caused by clang-format's shortcomings.
Comments like "clang-format on/off" are one of the solutions to the problem. Another being to force line breaks with empty trailing comments.

Regarding the strategy of adopting clang-format, the majority of projects I've seen cannot reformat everything at once.
First step is always to enforce git clang-format (but with macro-heavy code care must be taken because the context may accidentally become bigger than intended, at least for all the already released versions of clang-format, there are some improvements in master).
Apart from that, reformatting rarely touched files is another option (there's been a proposal to more or less automatically reformat files that haven't been touched for e.g. 6 months). But that's not a necessity IMO.

curdeius added inline comments.Nov 27 2020, 8:24 AM
libcxx/utils/generate_feature_test_macro_components.py
684

The indentation of preprocessor directives depends on IndentWidth: (currently 2).
I find it a bit inconsistent, as it is:
#, then N * IndentWidth spaces, then directive (so directive is at the N * IndentWidth + 1 column)
against what I used to think and it seems what you think too:
#, N * IndentWidth - 1 spaces, directive (directive being at N * IndentWidth column.

Anyway, I wanted to match how clang-format would format it.
Note that we can keep this script unchanged, it's auto-generated anyway. I'd just keep the addition of clang-format on/off comments.

ldionne requested changes to this revision.Nov 27 2020, 8:32 AM

I think it's clear that we'll benefit from clang-format.

However, Marshall does have a point that sometimes, you want to format things in a specific way and not have clang-format mess it up. I think it's fine to use // clang-format off for those cases.

@curdeius Could you add a CI job that makes sure that the commit we're testing has been run through git clang-format? You could run it first in the pipeline, and we could even consider adding a wait step in the pipeline so that the rest of the pipeline doesn't run if you haven't clang-formated your changes.

This revision now requires changes to proceed.Nov 27 2020, 8:32 AM

@curdeius Could you add a CI job that makes sure that the commit we're testing has been run through git clang-format? You could run it first in the pipeline, and we could even consider adding a wait step in the pipeline so that the rest of the pipeline doesn't run if you haven't clang-formated your changes.

I'll see what I can do!

Thanks @curdeius for looking into this! The lack of proper clang-format rules for libc++ also bothered me a bit. Ut didn't bother me enough to look into it yet.

I also prefer to use automatic formatting as much as possible. If clang-format has things it doesn't format well, we might want to ask the clang-format devs so they can improve it. Then we can use the automatic formatting after the next release. That would benefit us and the other users of clang-format.

I prefer to have one commit to format everything, then we only need to blacklist one commit in the blame list. I would at least prefer to do all code in include and src in one commit. I don't have a preference whether we should also do the unit tests and benchmarks in one go or fix them when the files are touched.

I don't have a strong preference on which clang-format rules we use. However I think we should decide on the clang-format rules before formatting the code.

curdeius updated this revision to Diff 308167.Nov 28 2020, 8:37 AM
  • Break format for testing.
  • Add buildkite job for checking formatting.
curdeius updated this revision to Diff 308168.Nov 28 2020, 8:38 AM

Fix bad arc diff base.

I modified headers in include/ just to test the CI.

curdeius updated this revision to Diff 308173.Nov 28 2020, 9:14 AM
  • Wait after Format step.
curdeius updated this revision to Diff 308174.Nov 28 2020, 9:23 AM
  • Use tee instead of redirection.
curdeius updated this revision to Diff 308178.Nov 28 2020, 9:50 AM
  • Create BUILD_DIR.
Harbormaster completed remote builds in B80424: Diff 308174.
thakis added a subscriber: thakis.Nov 28 2020, 10:42 AM

If the idea is to reformat everything, I'd recommend just using the normal LLVM style, like the rest of the project.

I really like that we get clang-format to work. Live is simply to short to worry about formatting.

I assume we should simply make one monster commit adding that to the exception list for git blame and be done with it

That would still make it more difficult to blame stuff if lines move around a lot. Another option would be to mandate that we clang-format everything we touch from now on. Thus, as we make changes to the library, the lines we change (and some amount of surrounding context) would be re-formatted. I'm not sure which approach is best, since I haven't done it before. I'd be curious to hear stories of folks who moved existing codebases to Clang format.

This is a mostly solved problem: Git 2.23 added support for ignoring whitespace commits in blame output. You store hashes that should be blame-transparent in a file called .git-blame-ignore-revs (or any other name), check that file in, and then either pass --ignore-revs-file .git-blame-ignore-revs to git blame or run git config blame.ignoreRevsFile .git-blame-ignore-revs once to make git blame always use that blame file.

We auto-formatted a big chunk of Chromium's code a while ago. We did this before git 2.23 so we wrote a custom tool that does the same thing (https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html), but with git 2.23+ there's even upstream support for this.

curdeius planned changes to this revision.Nov 28 2020, 10:53 PM
curdeius updated this revision to Diff 308268.Nov 29 2020, 11:59 PM
  • Bash: set -o pipefail to fail on missing git-clang-format.

@ldionne, I'll need your help to set this up. Agents need to have git-clang-format installed. I haven't tried but I guess I can't apt install clang-format in the buildkite script and it doesn't sound like a great idea anyway.
And, wouldn't it be possible to reuse the existing machinery in llvm/clang premerge checks that runs git-clang-format? I am afraid it can't be reused because it uses different CI pipeline and different agents...

curdeius updated this revision to Diff 308270.Nov 30 2020, 12:08 AM
  • Get only clang-format.patch artifact.
ldionne requested changes to this revision.Dec 1 2020, 3:05 PM
ldionne added a subscriber: goncharov.

This is a mostly solved problem: Git 2.23 added support for ignoring whitespace commits in blame output. You store hashes that should be blame-transparent in a file called .git-blame-ignore-revs [...]

I've used that for smaller whitespace-changing patches, however does git do a good job at figuring out the blame correctly when stuff moves around a lot? My concern was that .git-blame-ignore-revs wouldn't work well for such a large reflow, however if it does, then I'd say my concerns are mostly alleviated.

We auto-formatted a big chunk of Chromium's code a while ago. We did this before git 2.23 so we wrote a custom tool that does the same thing (https://commondatastorage.googleapis.com/chrome-infra-docs/flat/depot_tools/docs/html/git-hyper-blame.html), but with git 2.23+ there's even upstream support for this.

That's large enough to be considered a success story, clearly.

@ldionne, I'll need your help to set this up. Agents need to have git-clang-format installed. I haven't tried but I guess I can't apt install clang-format in the buildkite script and it doesn't sound like a great idea anyway.

No, indeed. You'd need to update the Dockerfile and install the utility on the macos builders. I'll do it depending on what Mikhail says below -- it's easier for me since they are sitting a meter away :-).

And, wouldn't it be possible to reuse the existing machinery in llvm/clang premerge checks that runs git-clang-format? I am afraid it can't be reused because it uses different CI pipeline and different agents...

I guess it may be possible. @goncharov Mikhail, if we decide to enable clang-format for libc++, would it be possible to re-enable the clang-format checks? IIRC, I think I had requested to remove them cause they were always broken.

libcxx/include/variant
1666 ↗(On Diff #308270)

What controls this? I would *love* to move away from declaring the return type of the function above, since it gets lost in our numerous attributes.

libcxx/utils/ci/run-buildbot
99 ↗(On Diff #308270)

I think it would be reasonable to set this for the whole script. WDYT?

103 ↗(On Diff #308270)

We should also include libcxxabi.

This revision now requires changes to proceed.Dec 1 2020, 3:05 PM
curdeius updated this revision to Diff 308906.Dec 2 2020, 1:16 AM
  • Set -o pipefail for the whole script.
  • Run git-clang-format also on libcxxabi/.
  • Rebase.
curdeius added inline comments.Dec 2 2020, 1:18 AM
libcxx/include/variant
1666 ↗(On Diff #308270)

I guess the hypothetical AlwaysBreak*Before*ReturnType (there is only AlwaysBreakAfterReturnType currently). That shouldn't be something difficult to implement in clang-format I guess.
I agree that the return type can get lost with all the attributes.

curdeius marked 2 inline comments as done.Dec 2 2020, 1:19 AM
curdeius updated this revision to Diff 311515.Dec 14 2020, 1:19 AM

Rebase to trigger CI after buildbot changes in 202df6870ea2ac883ac64fab8e01c836da86978b ("[libc++] Install clang-format on CI nodes").

curdeius planned changes to this revision.Dec 14 2020, 3:35 AM

Waiting for D93201.

Do we need to discuss the wanted formatting in this patch or on Slack/dev-ml?

libcxx/.clang-format
5

Do we need to keep this at Cpp03? Changing this to newer versions of C++ will of course break code like 'std::vector<std::pair<int, int>>`. But this seems to break formatting the C++11 string literal prefixes: u8, u, U.

Concerning the style itself, personally I'd go for a style that is as close to vanilla LLVM style as possible (except for things like disabling comment reflowing for tests, but I'd put this specific .clang-format config file into libcxx/test).
For the moment I'm waiting for the buildbots to be ready so that I can finish the build part.

libcxx/.clang-format
5

I'd love to remove Cpp03... But clang in c++03 mode gives error: error: a space is required between consecutive right angle brackets (use '> >').
BTW, from clang-format documentation: "Cpp03 is a deprecated alias for c++03".
Personally I'd like to use Standard: Latest.
There's also Standard: Auto, that may be a compromise to use > > in c++03 sensitive code and >> elsewhere.

Ref: https://clang.llvm.org/docs/ClangFormatStyleOptions.html

Mordante added inline comments.Dec 15 2020, 8:58 AM
libcxx/.clang-format
5

True for C++98 the right angle brackets need spaces. AFAIK it's not possible to force that when using a newer C++ version. But C++03 will add a space when using u8"foo" and turn it into u8 "foo". This gives issue with C++11. C++14 code like 1'000'000 will also fail. So maybe we should switch to the newest mode and not format everything, but format per patch basis?

curdeius updated this revision to Diff 312397.Dec 17 2020, 12:55 AM

Rebase to trigger CI after D93201.

curdeius planned changes to this revision.Dec 17 2020, 1:23 AM
curdeius updated this revision to Diff 312401.Dec 17 2020, 1:25 AM
  • Use --binary /usr/bin/clang-format-11 as a temporary workaround.
curdeius updated this revision to Diff 312404.Dec 17 2020, 1:30 AM
  • Limit folders checked for formatting.
ldionne added inline comments.Jan 19 2021, 10:45 AM
libcxx/utils/ci/buildkite-pipeline.yml
18 ↗(On Diff #312404)

To unblock this and get started even without having entirely settled all formatting questions, I suggest we make this step a soft-fail:

soft_fail:
      - exit_status: 1

(described at https://buildkite.com/docs/pipelines/command-step#soft-fail-attributes)

Then, we'll be able to play around a bit with the configuration. Once we've settled on something, we can remove the soft-fail on the step to enforce formatting for new commits.

curdeius updated this revision to Diff 320075.Jan 29 2021, 1:50 AM

Add soft_fail. Revert .clang-format and generate_feature_test_macro_components.py changes.

curdeius planned changes to this revision.Jan 29 2021, 1:50 AM
curdeius marked an inline comment as done.
curdeius retitled this revision from [libc++] Update clang-format configuration to [libc++] Add format check to CI.Jan 29 2021, 1:53 AM
curdeius edited the summary of this revision. (Show Details)
curdeius updated this revision to Diff 320088.Jan 29 2021, 3:01 AM
  • Remove wait from CI after Format task. Revert changes in <vector> used for testing.

It would be cool if Buildkite -> Phabricator bridge handled the soft fails.
See previous version https://reviews.llvm.org/D92229?id=320075. It shows CI passed, no warning or whatever, even though Format (soft-) failed, https://buildkite.com/llvm-project/libcxx-ci/builds/1171.
Anyone knows where the code responsible for this is?

ldionne accepted this revision.Feb 4 2021, 8:37 AM

It would be cool if Buildkite -> Phabricator bridge handled the soft fails.
See previous version https://reviews.llvm.org/D92229?id=320075. It shows CI passed, no warning or whatever, even though Format (soft-) failed, https://buildkite.com/llvm-project/libcxx-ci/builds/1171.
Anyone knows where the code responsible for this is?

@goncharov might know.

This LGTM, I think this is a good start. We can start running that manually before we commit for now, to see what things look like. Once we're comfortable, we can move to the next level and make it mandatory.

This revision is now accepted and ready to land.Feb 4 2021, 8:37 AM
This revision was automatically updated to reflect the committed changes.