Any change to clang-format is tested with the unit tests, However sometimes the better approach is to run it over a very large fully formatted source tree and then inspect the differences. This seems to be a source of many of the regressions found by @krasimir and by @sylvestre.ledru and @Abpostelnicu who run it over the Mozilla sources, but often these regressions are only found after changes have been committed.
LLVM itself would be a good dog-fooding candidate for similar tests except such a large proportion of the tree is not 100% clang formatted, as such you are never aware if the change comes from a change to clang-format or just because the tree has not been formatted first.
I've heard all the reasons as to why we don't want to clang-format everything in one go (despite git having a mechanism to remove such wholesale changes from the git blame), and whilst I can live with that, I still think that getting to 100% is a viable goal further down the line over time.
However there are large parts of the LLVM source tree which are 100% (or almost) formatted (milr,flang,clangd,polly) but finding those areas such that are "clang-format clean" can be hard so its always been difficult for clang-format to be run against LLVM itself.
The following review is for a small python tool which scans the whole of the LLVM source tree and counts the number of files which have one or more clang-format violations.
This revision contains the tool and the output from the initial run of the tool and the generated documentation which looks like the following
(the style is taken from an OpenMP table which is used to show the progress)
This document is linked into the ClangFormat documentation and I hope might act as a gentle nudge to other LLVM developers to up their % so that we can run clang-format over much more code. If you want us to ensure clang-format doesn't break against your code in the future getting your directory to 100% is a great start.
I've plans to extend the tool later to generate a test file containing clean directories which could be used to autogenerate a regression test, that would mean we could 100% scan the clean areas of LLVM prior to any change.
I also feel we could extend this to include statistics and track the average clang-formatted % content over time, with a general goal for LLVM to become clang-format clean.