This is an archive of the discontinued LLVM Phabricator instance.

[clang-format] Proposal for clang-format to give compiler style warnings
ClosedPublic

Authored by MyDeveloperDay on Oct 6 2019, 8:59 AM.

Details

Summary

Related somewhat to D29039: [clang-format] Proposal for clang-format -r option

On seeing a quote on twitter by @invalidop

If it's not formatted with clang-format it's a build error.

This made me want to change the way I use clang-format into a tool that could optionally show me where my source code violates clang-format syle.

When I'm making a change to clang-format itself, one thing I like to do to test the change is to ensure I didn't cause a huge wave of changes, what I want to do is simply run this on a known formatted directory and see if any new differences arrive in a manner I'm used to.

This started me thinking that we should allow build systems to run clang-format on a whole tree and emit compiler style warnings about files that fail clang-format in a form that would make them as a warning in most build systems and because those build systems range in their construction I don't think its unreasonable to NOT expect them to have to do the directory searching or parsing the output replacements themselves, but simply transform that into an error code when there are changes required.

I am starting this by suggesing adding a -n or -dry-run command line argument which would emit a warning/error of the form

Support for various common compiler command line argumuments like '-Werror' and '-ferror-limit' could make this very flexible to be integrated into build systems and CI systems.

> $ /usr/bin/clang-format --dry-run ClangFormat.cpp -ferror-limit=3 -fcolor-diagnostics
> ClangFormat.cpp:54:29: warning: code should be clang-formatted [-Wclang-format-violations]
> static cl::list<std::string>
>                             ^
> ClangFormat.cpp:55:20: warning: code should be clang-formatted [-Wclang-format-violations]
> LineRanges("lines", cl::desc("<start line>:<end line> - format a range of\n"
>                    ^
> ClangFormat.cpp:55:77: warning: code should be clang-formatted [-Wclang-format-violations]
> LineRanges("lines", cl::desc("<start line>:<end line> - format a range of\n"
>                                                                             ^

Diff Detail

Event Timeline

MyDeveloperDay created this revision.Oct 6 2019, 8:59 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 6 2019, 8:59 AM
Herald added a subscriber: mgorny. · View Herald Transcript
klimek added a comment.Oct 6 2019, 9:18 AM

Why not build a tool that takes the diff output of clang-format and changes it to what you want? Wouldn't that be a couple of lines of python?

MyDeveloperDay added a comment.EditedOct 6 2019, 9:46 AM

Why not build a tool that takes the diff output of clang-format and changes it to what you want? Wouldn't that be a couple of lines of python?

Thanks for taking the time to look at this idea. At least initially I can see that that would seem like a low impact way of doing it, and this also relates to D29039: [clang-format] Proposal for clang-format -r option, initially I felt this is not clang-formats job.. but the more I look online the more I see people using tricks to parse the output-replacements-xml

https://stackoverflow.com/questions/22866609/can-clang-format-tell-me-if-formatting-changes-are-necessary

But really what it means is each developer/group has to come up with their own solution or we have to develop some external python script.

But take a look at the hoops people are jumping thought just to determine if something needs to be formatted. All the additional tools like find/xargs/grep and these tools aren't always truly cross-platform, throw in additional complications of those people not using WSL,MingW or Cygwin and you now need a powershell or cmd.exe solution too. To make matters work every build system is different or needs some $shell() magic to run a series of commands, and all of that's before we even mention the whole python issue of getting it to work reliably in both Cygwin and/or Windows. (did I even mention the whole color diagnostics etc.. that you get for free by reusing the LLVM classes)

When you consider that really apart of the setting up of the DiagnosticsEngine (which is a large portion of this code change), its actually quite a small fix. (which still needs lit tests in clang/test/Format I know, I'm trying to get those working and failing miserably)

This idea really amounts to making the xml <replacements> human-readable, in a way in which both developers but also build systems which are used to parsing and/or displaying the output of compilers know and understand.

Like the quote says, its about making clang-format violations seem much more like build errors:

If it's not formatted with clang-format it's a build error.

Using a suggestion from D68551: [clang-format] [NFC] Ensure clang-format is itself clang-formatted. as an example of its usefulness, I modified my clang-format CMakeList.txt to do as Polly does and add clang-format check rules, combining this with this revision gives a nice integration into the build system that lets you quickly and easily see where violations creep in.

I really feel this could help bring clang-format checking into developers build workflows using a similar mechanism and it really highlights what is clean and what is not.

# Add target to check formatting of files
file( GLOB files *.cpp ../../lib/Format/*.cpp ../../lib/Format*.h ../../unittests/Format/*.cpp ../../include/clang/Format.h)

set(check_format_depends)
set(update_format_depends)
set(i 0)
foreach (file IN LISTS files)
  add_custom_command(OUTPUT clang-format-check-format${i}
    COMMAND clang-format -i -ferror-limit=1 --dry-run -sort-includes -style=llvm ${file}
    VERBATIM
    COMMENT "Checking format of ${file}..."
  )
  list(APPEND check_format_depends "clang-format-check-format${i}")

  add_custom_command(OUTPUT clang-format-update-format${i}
    COMMAND clang-format -sort-includes -i -style=llvm ${file}
    VERBATIM
    COMMENT "Updating format of ${file}..."
  )
  list(APPEND update_format_depends "clang-format-update-format${i}")

  math(EXPR i ${i}+1)
endforeach ()

add_custom_target(clang-format-check-format DEPENDS ${check_format_depends})
set_target_properties(clang-format-check-format PROPERTIES FOLDER "ClangFormat")

add_custom_target(clang-format-update-format DEPENDS ${update_format_depends})
set_target_properties(clang-format-update-format PROPERTIES FOLDER "ClangFormat")
[ 94%] Built target obj.clangSema
[ 94%] Built target clangAST
[ 94%] Built target clangSema
[ 94%] Built target clang-format
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/AffectedRangeManager.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/BreakableToken.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/ContinuationIndenter.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/FormatToken.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/Format.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/FormatTokenLexer.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/NamespaceEndCommentsFixer.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/SortJavaScriptImports.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/TokenAnalyzer.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/TokenAnnotator.cpp...
[ 94%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/UnwrappedLineFormatter.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/UsingDeclarationsSorter.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/UnwrappedLineParser.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../lib/Format/WhitespaceManager.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/CleanupTest.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTest.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestComments.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestCSharp.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestJS.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestJava.cpp...
C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/CleanupTest.cpp:455:2: warning: code should be clang-formatted [-Wclang-format-violations]
}
 ^
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestProto.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestObjC.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestRawStrings.cpp...
C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestComments.cpp:31:21: warning: code should be clang-formatted [-Wclang-format-violations]
  enum StatusCheck {
                    ^
[100%] C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestJS.cpp:51:47: warning: code should be clang-formatted [-Wclang-format-violations]
    EXPECT_EQ(Code.str(), format(Code, Style))
                                              ^
Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestTableGen.cpp...
[100%] Checking format of C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestSelective.cpp...
C:/llvm/llvm-project/clang/tools/clang-format/../../unittests/Format/FormatTestJava.cpp:455:13: warning: code should be clang-formatted [-Wclang-format-violations]
  EXPECT_EQ(
            ^

I don't care for the command line switches that try to emulate compiler flags. I am also of the opinion that external scripts should be used for this functionality. git for Windows gives you a full set of bash utilities to utilize, so doing stuff like this on Windows is much easier now.

I don't care for the command line switches that try to emulate compiler flags. I am also of the opinion that external scripts should be used for this functionality. git for Windows gives you a full set of bash utilities to utilize, so doing stuff like this on Windows is much easier now.

For me, this is about easy integration into IDEs and build systems of a variety of flavours (make,cmake,msbuild,VS,etc..) so clang-format can be run as a pre/post build step.

It's not that it can't be done without using a series of scripts and additional processes like find/python/grep etc.. its just its a right pain! I often find I have to jump through hoops to get Visual Studio's build steps to work calling external scripts, (often calling a .bat file, which runs a bash file etc...) pulling some of that functionality into clang-format prevents the need to jump through those hoops. (I see this as no different to grep having a -r option? why do that when you could use find?)

The ironic thing is I run this on my local build for clang-format itself and it catches when people ask me to land their patches but they haven't remembered to git clang-format, and that is the point of the proposal, people either can't or don't run arc lint, they forget to run git clang-format and code creeps back in (even into clang-format) that is unformatted.

This is why it needs to be part of the build step and it needs to present itself in a way people are expecting from a compiler, especially so the buildbot picks this output up and presents it as a warning, and ultimately so we can stop saying in code review "please clang format".

The compiler like switches -Werror, and -ferror-limit=1 allow you to either stop your build or get a single warning per file giving you fine control about how invasive this is for your build output.

Having the output emulate the compiler output also has the added advantage of allowing a user to double click on the error/warning in visual studio and be taken to the exact format violation, (this has been especially useful as I inspect the C# in .NET core, whilst I try to work out what rules I'm missing for formatting C#), I could see the same would be useful for someone adopting clang-format and trying to determine which options to use and exactly how much code will change. (again this is something I use on clang-format myself when we make a code change, I quickly run this over a known formatted source tree to see what files are going to change if any, this lets me assess the change beyond what the unit tests are testing)

In case others are interested, I've landed this behaviour in my own experimental fork and made a release if people would like to try. https://github.com/mydeveloperday/clang-experimental

I take your points, but I think its worth it, given how simple the change is.

As this behaviour is not something clang-format should do because there are external tool that can do this for us, I thought I'd set myself a challenge of trying to write this with external tools, to prove to myself that it was indeed easy, what I want to achieve is being able to go to a directory and determine if there are any clang-format changes needed in the source files of the levels below with a single command.

The idea would be that this could be something that could fit on the command line, or made into a bash alias/function (more likely), I really don't want to turn to python as if I'm coding this in another language I might as well be coding it in C++ inside clang-format and it opens me up to the whole can't find the correct python headache. (but I also think the clang diagnostic classes give doing it inside C++ the edge)

I want to take into account at least the concepts from D29039: [clang-format] Proposal for clang-format -r option about being able to look deeply into a directory.

There are a couple of requirements: (some nice to have's)

  1. look deeply into a directory for all files supported by clang-format (C++,C# etc)
  2. file extensions should be case insensitive i.e. .cpp or .CPP
  3. be able to clearly see the name of the file that need clang-formatting
  4. don't actually do the formatting (tell me what needs doing but don't necessarily make the change)
  5. ideally show a snippet of the file where the change is needed (I guess this isn't essential)
  6. This should be cross platform windows, linux, (I guess other *nix's) it should work in at least the common shells bash,cygwin,powershell

The cross platform is the real killer, this forces you down the python road, (or the C++) if you want a common solution otherwise you are implementing different solutions on different plaforms.

But lets ignore that and lets start with a Linux or Cygwin bash shell. Surely the best way to find all source files that can be converted is via a find. In the past I've only ever used find like this to find multiple extensions

find .\( -name "*.h" -o name "*.cpp" \) -print

But this is really unworkable for so many output formats that clang-format supports, and even if you could get that to work it would be case sensitive unless you uses -ilname which would make the command line even longer

So lets try using -iregex, this is a much shorter case insensitive command but still, it needs a certain amount of escaping in the regular expression, (lets hope we never do "clang-format all the things") as this could get longer.

find . -iregex '.*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)' -print

if I want to put that into an alias then I need switch the ' and " around

alias find_all_code='find . -iregex ".*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)" -print'

Is the same thing as easy in powershell? Ok not really .. this is as far as I got, and I couldn't work out how to handle multiple extensions let alone case sensitivity.. (it feels very slow too!)

PowerShell.exe -Command "dir -Recurse -ea 0 | ? FullName -like '*.cpp'

So at least my brief initial conclusion is... if we are not using C++, its Python or nothing if you want cross-platform or maybe find if you only care about *nix

But even the find... that only covers 1) and 2) of my requirements...how do I determine even with find which files need formatting..

This command would show me the contents of everything that needs changing.

find . -iregex '.*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)' -exec clang-format {} \;

This would give me the list of changes, but then I lose the name of the file.

find . -iregex '.*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)' -exec clang-format --output-replacements-xml {} \;

So using external tools is easy how? the dumping of the output to stdout or the replacement xml to stdout makes this much harder as I lose the filename that was passed to clang-format, (there is probably an xargs trick here I'm missing)

This proposed solution would make all this possible just with:

find . -iregex '.*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)' -exec clang-format -i -n {} \;

and more to the point the list of file types supported by clang-format is something clang-format knows... why not combine this with D29039: [clang-format] Proposal for clang-format -r option and just do

clang-format -r -i -n .

This provides in my view the only cross platform, shell independent way of doing this in what is a relatively minimal code change, and I'm struggling to see this as easy to do with an external tools? but if someone wants to show me how, I'd be interested to learn.

As this behaviour is not something clang-format should do because there are external tool that can do this for us, I thought I'd set myself a challenge of trying to write this with external tools, to prove to myself that it was indeed easy, what I want to achieve is being able to go to a directory and determine if there are any clang-format changes needed in the source files of the levels below with a single command.

The idea would be that this could be something that could fit on the command line, or made into a bash alias/function (more likely), I really don't want to turn to python as if I'm coding this in another language I might as well be coding it in C++ inside clang-format and it opens me up to the whole can't find the correct python headache. (but I also think the clang diagnostic classes give doing it inside C++ the edge)

I want to take into account at least the concepts from D29039: [clang-format] Proposal for clang-format -r option about being able to look deeply into a directory.

There are a couple of requirements: (some nice to have's)

  1. look deeply into a directory for all files supported by clang-format (C++,C# etc)
  2. file extensions should be case insensitive i.e. .cpp or .CPP
  3. be able to clearly see the name of the file that need clang-formatting
  4. don't actually do the formatting (tell me what needs doing but don't necessarily make the change)
  5. ideally show a snippet of the file where the change is needed (I guess this isn't essential)
  6. This should be cross platform windows, linux, (I guess other *nix's) it should work in at least the common shells bash,cygwin,powershell

The cross platform is the real killer, this forces you down the python road, (or the C++) if you want a common solution otherwise you are implementing different solutions on different plaforms.

But lets ignore that and lets start with a Linux or Cygwin bash shell. Surely the best way to find all source files that can be converted is via a find. In the past I've only ever used find like this to find multiple extensions

find .\( -name "*.h" -o name "*.cpp" \) -print

But this is really unworkable for so many output formats that clang-format supports, and even if you could get that to work it would be case sensitive unless you uses -ilname which would make the command line even longer

So lets try using -iregex, this is a much shorter case insensitive command but still, it needs a certain amount of escaping in the regular expression, (lets hope we never do "clang-format all the things") as this could get longer.

find . -iregex '.*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)' -print

if I want to put that into an alias then I need switch the ' and " around

alias find_all_code='find . -iregex ".*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)" -print'

Is the same thing as easy in powershell? Ok not really .. this is as far as I got, and I couldn't work out how to handle multiple extensions let alone case sensitivity.. (it feels very slow too!)

PowerShell.exe -Command "dir -Recurse -ea 0 | ? FullName -like '*.cpp'

So at least my brief initial conclusion is... if we are not using C++, its Python or nothing if you want cross-platform or maybe find if you only care about *nix

But even the find... that only covers 1) and 2) of my requirements...how do I determine even with find which files need formatting..

This command would show me the contents of everything that needs changing.

find . -iregex '.*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)' -exec clang-format {} \;

This would give me the list of changes, but then I lose the name of the file.

find . -iregex '.*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)' -exec clang-format --output-replacements-xml {} \;

So using external tools is easy how? the dumping of the output to stdout or the replacement xml to stdout makes this much harder as I lose the filename that was passed to clang-format, (there is probably an xargs trick here I'm missing)

This proposed solution would make all this possible just with:

find . -iregex '.*\.\(cpp\|cc\|c\+\+\|cxx\|c\|cl\|h\|hh\|hpp\|m\|mm\|inc\|js\|ts\|protor\|protodevel\|java\|cs\)' -exec clang-format -i -n {} \;

and more to the point the list of file types supported by clang-format is something clang-format knows... why not combine this with D29039: [clang-format] Proposal for clang-format -r option and just do

clang-format -r -i -n .

This provides in my view the only cross platform, shell independent way of doing this in what is a relatively minimal code change, and I'm struggling to see this as easy to do with an external tools? but if someone wants to show me how, I'd be interested to learn.

I think it's easy enough to do as a kind of driver, either in python or C++, but then again, at that point putting it into ClangFormat seems fine. One thing that I find on the one side kinda nice, but on the other side also weird is the names of the flags. Making them more similar to clang flags is nice from the point of people used to clang being able to quickly identify them, but the structure also makes it look like clang-format would support a wider range of flags, which it doesn't. Not sure which side I'm on, and I'm fine with the patch as is (minus pulling out a function), but wanted to bring it up if folks have ideas.

clang/tools/clang-format/ClangFormat.cpp
445

I'd pull this block out into a function.

MyDeveloperDay marked 2 inline comments as done.Oct 10 2019, 2:45 AM
MyDeveloperDay added inline comments.
clang/tools/clang-format/ClangFormat.cpp
445

me too I've already done that, plus I think I'm leaking the TestDiagnosticPrinter so I've corrected that locally. I'll update the patch but I'd like to do some general tidying too which is why I submitted D68767: [clang-format] NFC - Move functionality into functions to help code structure

MyDeveloperDay marked an inline comment as done.EditedOct 10 2019, 2:53 AM

I think it's easy enough to do as a kind of driver, either in python or C++, but then again, at that point putting it into ClangFormat seems fine. One thing that I find on the one side kinda nice, but on the other side also weird is the names of the flags. Making them more similar to clang flags is nice from the point of people used to clang being able to quickly identify them, but the structure also makes it look like clang-format would support a wider range of flags, which it doesn't. Not sure which side I'm on, and I'm fine with the patch as is (minus pulling out a function), but wanted to bring it up if folks have ideas.

The reason I added -Wclang-format-violations and -Wnoclang-format-violations was not only because I wanted to treat them like compiler warnings, but also so that the command line argument could be used the turning on/off to alter the exit code so as to either stop a build or continue based on -Werror or if you'd turn the individual warning off.

Also I know there are external tools, (I think SonarCube) which I think could read the code in the [] e.g. 'ClangFormat.cpp:54:29: warning: code should be clang-formatted [-Wclang-format-violations]' and be able to use that along with the file, row and column information to categorize and the annotate source code.

if you think this is too much perhaps I should make a new tool clang-format-check that did only this, but then I am concerned that it would be almost 100% clang-format anyway, and it wouldn't be long before a request came in to merge the functionality.

Let me update the patch so it looks even less invasive.

I think it's easy enough to do as a kind of driver, either in python or C++, but then again, at that point putting it into ClangFormat seems fine. One thing that I find on the one side kinda nice, but on the other side also weird is the names of the flags. Making them more similar to clang flags is nice from the point of people used to clang being able to quickly identify them, but the structure also makes it look like clang-format would support a wider range of flags, which it doesn't. Not sure which side I'm on, and I'm fine with the patch as is (minus pulling out a function), but wanted to bring it up if folks have ideas.

The reason I added -Wclang-format-violations and -Wnoclang-format-violations was not only because I wanted to treat them like compiler warnings, but also so that the command line argument could be used the turning on/off to alter the exit code so as to either stop a build or continue based on -Werror or if you'd turn the individual warning off.

Also I know there are external tools, (I think SonarCube) which I think could read the code in the [] e.g. 'ClangFormat.cpp:54:29: warning: code should be clang-formatted [-Wclang-format-violations]' and be able to use that along with the file, row and column information to categorize and the annotate source code.

if you think this is too much perhaps I should make a new tool clang-format-check that did only this, but then I am concerned that it would be almost 100% clang-format anyway, and it wouldn't be long before a request came in to merge the functionality.

I ran thruogh the same line of reasoning in my head when writing my last comment :) The argument about readability of error messages by tools is definitely a good one, so I'd say this is fine.

builds ontop of D68767: [clang-format] NFC - Move functionality into functions to help code structure move replacement warnings into a separate function and move outputXML out into its own function to declutter the format() function a little

I ran through the same line of reasoning in my head when writing my last comment :) The argument about the readability of error messages by tools is definitely a good one, so I'd say this is fine.

that's great

klimek accepted this revision.Oct 11 2019, 2:48 AM

lg

clang/tools/clang-format/ClangFormat.cpp
293

I'd name this getInvalidBOM and make the comment
// If BufStr has an invalid BOM, returns the BOM name; otherwise, returns nullptr.

This revision is now accepted and ready to land.Oct 11 2019, 2:48 AM

I'm going to commit this as it was approved, but I've added some simple lit tests, which exposed a double free, I fixed that up, and for the lit tests not to fail with an exit code I now return WarningAsError from the emitReplacements function.

This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Oct 12 2019, 3:52 PM

This fails on macOS:

: 'RUN: at line 2';   grep -E "*code should be clang-formatted*" /Users/thakis/src/llvm-project/out/gn/obj/clang/test/Format/Output/dry-run.cpp.tmp.stderr
--
Exit Code: 2

Command Output (stderr):
--
grep: repetition-operator operand invalid

grep -E means extended regex, where you want .* to match anything. But this probably should use FileCheck, or maybe even -verify since you're adding a dep on Frontend anyways.

I've reverted this for now to green up the bots.

For the reland, did y'all consider the impact of this on clang-format build time (it now depends on Frontend and hence on Driver and Sema, and Sema is slow to build) and binary size (it's now basically impossible to ever get the diagnostics tables in clang-format gc'd by the linker)?

It might make sense to instead use LLVM's diag stuff (instead of clang's) since all the actual diags will be disjoint.

I've reverted this for now to green up the bots.

For the reland, did y'all consider the impact of this on clang-format build time (it now depends on Frontend and hence on Driver and Sema, and Sema is slow to build) and binary size (it's now basically impossible to ever get the diagnostics tables in clang-format gc'd by the linker)?

It might make sense to instead use LLVM's diag stuff (instead of clang's) since all the actual diags will be disjoint.

Thanks for doing this, I am struggling to find the MacOS bot log that failed, are any available via Buildbot? (I notice the log looks like your own machine)

I did a quick test, with the backed out change and it's ~200KB bigger with this revision on windows. (that's probably an assertion, debug build) as the last one in the LLVM snapshot builds is only 3,518,976 bytes

-rwxr-xr-x 1 None 15369728 Oct 12 17:36 clang-format-diags.exe
-rwxr-xr-x 1 None 15080960 Oct 13 10:44 clang-format.exe

on my machine, the current release size is just 2,708,480 bytes (I'm using VS2017) - (not sure why its smaller)

-rwxr-xr-x 1 None 2708480 Oct 13 11:00 clang-format.exe

make clang-format build speed with reverted change is 4m12.504s (Release build from clean)

-rwxr-xr-x 1 None 2817024 Oct 13 11:22 clang-format.exe (117KB larger)

make clang-format build speed with unreverted change is 4m51.504s (Release build from clean)

Rebuilding when nothing to rebuild goes from from 5 seconds to 6.7

So as you said it will be a little slower to build clang-format, however, I do notice that all the other clang/tools are pretty much are all building with Frontend. so anyone building a larger range of clang tools probably has everything already built.

I think the main build failure was just the lit tests, I think i would prefer to fix those and reland this as-is, then look for a better diags solution if @klimek thinks this is something I should be concerned about. I really wanted to be able to use a standard Diagnostic front end with all the support for console coloring etc...I'll have to take a look at LLVM's diagnostics to see how that is done.

Thanks for doing this, I am struggling to find the MacOS bot log that failed, are any available via Buildbot? (I notice the log looks like your own machine)

The mac bots are on a different system for some reason: http://green.lab.llvm.org/green/ (see also http://lists.llvm.org/pipermail/llvm-dev/2019-October/135771.html , I wasn't aware of that until a few days ago either).

So as you said it will be a little slower to build clang-format, however, I do notice that all the other clang/tools are pretty much are all building with Frontend. so anyone building a larger range of clang tools probably has everything already built.

I believe all the other clang tools conceptually do semantic analysis, so that makes more sense for them :)

I think the main build failure was just the lit tests, I think i would prefer to fix those and reland this as-is, then look for a better diags solution if @klimek thinks this is something I should be concerned about. I really wanted to be able to use a standard Diagnostic front end with all the support for console coloring etc...I'll have to take a look at LLVM's diagnostics to see how that is done.

I hope my view on this isn't entirely dismissed (see git shortlog -nes clang/lib/Format/).

In addition to the actual size cost of this change: about a third of clang-format's size is the diagnostics table, but before this change here it's kept alive by only two references from lib/Basic and it's reasonably easy to remove these, which should make the binary 1 MB smaller. After this change, that's much harder to do.

LLVM's text diag stuff also supports colors and whatnot.

And another idea: maybe it makes sense to expose these clang-format diagnostics you're adding (which I do think are a cool feature!) via clang-format? That already depends on everything and the kitchen sync, so you could easily output diagnostics from there – and it'd allow clang-format itself to be a tool focused on doing one thing well (namely, formatting code).

Thanks for doing this, I am struggling to find the MacOS bot log that failed, are any available via Buildbot? (I notice the log looks like your own machine)

The mac bots are on a different system for some reason: http://green.lab.llvm.org/green/ (see also http://lists.llvm.org/pipermail/llvm-dev/2019-October/135771.html , I wasn't aware of that until a few days ago either).

So as you said it will be a little slower to build clang-format, however, I do notice that all the other clang/tools are pretty much are all building with Frontend. so anyone building a larger range of clang tools probably has everything already built.

I believe all the other clang tools conceptually do semantic analysis, so that makes more sense for them :)

Agreed..

I think the main build failure was just the lit tests, I think i would prefer to fix those and reland this as-is, then look for a better diags solution if @klimek thinks this is something I should be concerned about. I really wanted to be able to use a standard Diagnostic front end with all the support for console coloring etc...I'll have to take a look at LLVM's diagnostics to see how that is done.

I hope my view on this isn't entirely dismissed (see git shortlog -nes clang/lib/Format/).

Actually I hope not too, I thought this Frontend mechanism was a bit overkill too, but I held back from just doing `llvm::errs() << ..." because I felt that was reinventing the wheel (but it would probably be more space-efficient) especially in terms of dependencies

In addition to the actual size cost of this change: about a third of clang-format's size is the diagnostics table, but before this change here it's kept alive by only two references from lib/Basic and it's reasonably easy to remove these, which should make the binary 1 MB smaller. After this change,

that's much harder to do.

Yes I get that, if I can switch to LLVM diags then I guess can avoid that.  I also wondered if there was some way of me doing this without the `TextDiagnosticPrinter` (which is the only thing that caused Frontend to need to come in, the rest of the Diagnositcs stuff (DiagnosticOptions,DiagnosticEngine etc..) was already being used.

LLVM's text diag stuff also supports colors and whatnot.

Could you or anyone else point me to a good example, I'd definitely like to take a look.

And another idea: maybe it makes sense to expose these clang-format diagnostics you're adding (which I do think are a cool feature!) via clang-format? That already depends on everything and the kitchen sync, so you could easily output diagnostics from there – and it'd allow clang-format itself to be a tool focused on doing one thing well (namely, formatting code).

`via clang-format?` did you mean via clang.exe or clang-check.exe?   I did think of this when I started doing this work, I just came to the conclusion that if I did that it wouldn't be long before someone requested the same features in clang-format itself.

I think from what you've said, trying to break the Frontend dependency is probably important, as for a dependency on lib/Basic I feel I was just asked to make that worse {D68914} perhaps that's why the code was duplicated in the first place...

bear with me I'm still learning all this stuff.
Could you or anyone else point me to a good example, I'd definitely like to take a look.

http://llvm-cs.pcc.me.uk/include/llvm/Support/SourceMgr.h/rSMDiagnostic --

`via clang-format?` did you mean via clang.exe or clang-check.exe?   I did think of this when I started doing this work, I just came to the conclusion that if I did that it wouldn't be long before someone requested the same features in clang-format itself.

Err, via clang-tidy. Sorry, that was a super confusing typo. That seems like a great place for this feature to me.

Could you or anyone else point me to a good example, I'd definitely like to take a look.

http://llvm-cs.pcc.me.uk/include/llvm/Support/SourceMgr.h/rSMDiagnostic --

`via clang-format?` did you mean via clang.exe or clang-check.exe?   I did think of this when I started doing this work, I just came to the conclusion that if I did that it wouldn't be long before someone requested the same features in clang-format itself.

Err, via clang-tidy. Sorry, that was a super confusing typo. That seems like a great place for this feature to me.

clang-tidy is a compiler tool that needs the full compile info, while clang-format is just a parser; I think it'd be actively confusing to have clang-tidy be a clang-format frontend given how different there usage is.

That said, I agree it's worth looking into using LLVM Diags without all of Clang's overhead.

@MyDeveloperDay I think it should be added to the release notes. it is a great new changes for clang format (it would have made my life at Mozilla much easier ;)

@MyDeveloperDay I think it should be added to the release notes. it is a great new changes for clang format (it would have made my life at Mozilla much easier ;)

I agree, to be honest, I like the way clang-tidy does their release notes, that new items are announced, I think it would be useful here to just say a few sentences about new features.

@MyDeveloperDay I think it should be added to the release notes. it is a great new changes for clang format (it would have made my life at Mozilla much easier ;)

I am super honoured you think this is a good change, to be honest in using it this week, I've spent so much time typing find . -name '*.cpp' ... I am even more convinced we need a recursive -r option something like D29039: [clang-format] Proposal for clang-format -r option so plan to pursue that more.

I can't wait for @hans next Windows Snapshot, this is my gate for rolling into our builds and CI system, after that, I'm gonna catch every single person who tries to build/check in unclang-formatted.

hans added a comment.Oct 17 2019, 7:02 AM

I can't wait for @hans next Windows Snapshot, this is my gate for rolling into our builds and CI system, after that, I'm gonna catch every single person who tries to build/check in unclang-formatted.

As it happens, there's a new snapshot out today :-)

I can't wait for @hans next Windows Snapshot, this is my gate for rolling into our builds and CI system, after that, I'm gonna catch every single person who tries to build/check in unclang-formatted.

As it happens, there's a new snapshot out today :-)

@MyDeveloperDay awarded this comment a like token!!! (if I could)

hliao added a subscriber: hliao.Oct 17 2019, 4:51 PM
hliao added inline comments.
clang/test/Format/dry-run-alias.cpp
2–3

Why not check with FileCheck?

clang/test/Format/dry-run.cpp
2–3

Why not check with FileCheck?

MyDeveloperDay marked an inline comment as done.Oct 17 2019, 9:02 PM
MyDeveloperDay added inline comments.
clang/test/Format/dry-run-alias.cpp
2–3

Absolutely no real reason other than my lack of ability to get it working correctly. I was trying to follow the other examples of how it could be written with FIleCheck but failed. In the end, I copied how it was done in some of the other tests for now but If someone could help me I would love to change them to use FileCheck properly.