Page MenuHomePhabricator

[clang-format] Create a python documentation tool to generate a summary of the clang-format status for the whole of the LLVM project
ClosedPublic

Authored by MyDeveloperDay on May 27 2020, 6:12 AM.

Details

Summary

Any change to clang-format is tested with the unit tests, However sometimes the better approach is to run it over a very large fully formatted source tree and then inspect the differences. This seems to be a source of many of the regressions found by @krasimir and by @sylvestre.ledru and @Abpostelnicu who run it over the Mozilla sources, but often these regressions are only found after changes have been committed.

LLVM itself would be a good dog-fooding candidate for similar tests except such a large proportion of the tree is not 100% clang formatted, as such you are never aware if the change comes from a change to clang-format or just because the tree has not been formatted first.

I've heard all the reasons as to why we don't want to clang-format everything in one go (despite git having a mechanism to remove such wholesale changes from the git blame), and whilst I can live with that, I still think that getting to 100% is a viable goal further down the line over time.

However there are large parts of the LLVM source tree which are 100% (or almost) formatted (milr,flang,clangd,polly) but finding those areas such that are "clang-format clean" can be hard so its always been difficult for clang-format to be run against LLVM itself.

The following review is for a small python tool which scans the whole of the LLVM source tree and counts the number of files which have one or more clang-format violations.

This revision contains the tool and the output from the initial run of the tool and the generated documentation which looks like the following
(the style is taken from an OpenMP table which is used to show the progress)

This document is linked into the ClangFormat documentation and I hope might act as a gentle nudge to other LLVM developers to up their % so that we can run clang-format over much more code. If you want us to ensure clang-format doesn't break against your code in the future getting your directory to 100% is a great start.

I've plans to extend the tool later to generate a test file containing clean directories which could be used to autogenerate a regression test, that would mean we could 100% scan the clean areas of LLVM prior to any change.

I also feel we could extend this to include statistics and track the average clang-formatted % content over time, with a general goal for LLVM to become clang-format clean.

Diff Detail

Event Timeline

MyDeveloperDay created this revision.May 27 2020, 6:12 AM
sylvestre.ledru added a comment.EditedMay 28 2020, 5:55 AM

Well done!

For Firefox, we decided to go ahead and reformat everything at once.
It has been way easier this way.

You should run flake8 & autopep8 on your python code. it isn't idomatic Python code (";" are useless on python for example)

clang/docs/tools/generate_formatted_state.py
65

() are useless in python for if

clang/docs/tools/generate_formatted_state.py
49

print xx is python 2
please use
print(xx)

72

if clean:
and revert the two blocks :)

MyDeveloperDay planned changes to this revision.May 28 2020, 6:34 AM

I'm running through flake8... almost every line is changing ;-) ...let me update before wasting any more of your time.

MyDeveloperDay added a comment.EditedMay 28 2020, 7:19 AM

I'm adding a summary table which shows LLVM only 44% of all LLVM cpp/h files are clang-formatted. ;-(

fake8 the python
add a totals line at the bottom of the table

curdeius added inline comments.May 28 2020, 8:56 AM
clang/docs/tools/generate_formatted_state.py
18

It will still not work with Python 3, you need to pass bytes to write() method.

Great idea on this! I may borrow this idea and make something similar for some migrations I'm working on.

Sanity check: is this going to be run automatically when docs are generated or done offline with results committed? I don't have a preference, either has pros and cons, but I'm confused as to when this script will be run.

Great idea on this! I may borrow this idea and make something similar for some migrations I'm working on.

Sanity check: is this going to be run automatically when docs are generated or done offline with results committed? I don't have a preference, either has pros and cons, but I'm confused as to when this script will be run.

To be honest I was thinking I'd run it from time to time and commit the rst (like with do with ClangFormatStyleOptions.rst), It takes a good 20 minutes or so on my machine but it could potentially be something we could get github workflows to automate. but for now I wanted to use it to name and shame (oops did I say that out loud!) so we get more areas we can run clang-format regression testing on.

curdeius requested changes to this revision.May 28 2020, 2:54 PM

First of all, very nice idea. :)
Second, could you please make sure that the script is compatible with Python 3?
Also, in order to clean up a bit the generation of the RST (to avoid the repetition of all this output.write() stuff), you can use multiline strings """, or in general, take inspiration from e.g. https://github.com/llvm/llvm-project/blob/master/libcxx/utils/generate_feature_test_macro_components.py#L720.
Lastly, it would be nice to add last update date somewhere.

This revision now requires changes to proceed.May 28 2020, 2:54 PM

@MyDeveloperDay , I know it's a strange request, but could you change (or remove) the background color for 100% case.
I'm partially color-blind and having red and green background in the same table is really hard to distinguish. I guess I'm not alone.
I'd suggest using something like light blue, it doesn't need to stand out anyway.

clang/docs/tools/generate_formatted_state.py
51

Unused subdirs variable: change to _.

53

That doesn't work on Windows because of slashes. You doesn't skip unittests (present at least in clang and llvm).

57

Here you use camelCase, but in other places you use snake_case (e.g. file_path). Please make that consistent.

57–59

You can move it outside the loop.

81

Writing path directly on Windows will put backslashes that are not rendered in rst, so you should either change backslashes to forward slashes (that's what I'd suggest) or double the backslashes. You can just path.replace('\\', '/').

@MyDeveloperDay , I know it's a strange request, but could you change (or remove) the background color for 100% case.
I'm partially color-blind and having red and green background in the same table is really hard to distinguish. I guess I'm not alone.
I'd suggest using something like light blue, it doesn't need to stand out anyway.

Absolutely..

curdeius added a comment.EditedMay 29 2020, 2:52 AM

@MyDeveloperDay, I've played around with the script, you can take as much as you judge useful from here: https://github.com/mkurdej/llvm-project/tree/arcpatch-D80627.

MyDeveloperDay marked an inline comment as done.EditedMay 29 2020, 3:52 AM

Would something like this be easier to read?

clang/docs/tools/generate_formatted_state.py
53

So unit tests is something that I I think needs to be clang-formatted, this is because often we are actively editing in there, (I use format on save) and so having clean tests is super important

The tests directories normally have 100's of small snippets of code and some may even be testing unformatted code deliberately, these files are often made once and not continuously edited, (whilst it would be good to have them clean, I wanted to give ourselves a fighting chance!)

Point taken about Windows, whilst I develop myself on Windows I use cygwin which is why it probably worked.

Change % colors to help those with deuteranomaly
ensure consistent naming
validate for python3 and Windows
fix flake8 issues
switch to multi-line string templates rather than series of write() calls
add date and current commit hash of when the analysis was run

That looks nicer indeed. Thanks.
Just some minor nitty-gritty comments.

clang/docs/tools/generate_formatted_state.py
31

Nit: wrong indentation.

53

OK, I agree for unittests. But then one could argue that the same should apply for test, nope?

75

Both tail and _tail seem unused. You can change to _.

curdeius requested changes to this revision.May 29 2020, 8:05 AM
This revision now requires changes to proceed.May 29 2020, 8:05 AM
curdeius added inline comments.May 29 2020, 8:07 AM
clang/docs/tools/generate_formatted_state.py
55

Another nit: I prefer writing """\ as it nicely aligns with subsequent lines.

MyDeveloperDay marked 14 inline comments as done.

Addressed review comments
Ensure working with python3
Used improvements from @curdeius's version

clang/docs/tools/generate_formatted_state.py
53

The table would become massive because there are 100's of tests like

test/XXX1/file.cpp
test/XXX2/file.cpp
test/XXX3/file.cpp
test/XXX4/file.cpp

where XXX1 is the name of one specific test condition (this will make a row in the table)

I think for now we could exclude test, but like I said the GTESTS are where I spend half my time and I'd like those to be clean.

We can bring it back later when we are closer to 100% (if ever)

curdeius accepted this revision.May 29 2020, 12:10 PM

LGTM! Thanks!

This revision is now accepted and ready to land.May 29 2020, 12:10 PM
This revision was automatically updated to reflect the committed changes.