Page MenuHomePhabricator

[clang-format] Proposal for clang-format -r option
Needs ReviewPublic

Authored by stryku on Jan 23 2017, 9:50 AM.

Details

Summary

Hello Llvm Community,

I would like to present you a small clang-format proposal.

I want to propose a '-r' option, with which, clang-format will search for files, which will match regex[s] (passed as a command line arguments, like normal files), down in the directories tree.

What motivates such changes is more user-friendly clang-format. Now when user want to do format all files in the current location and subdirectories,, he need to write some scripts/use bash magic or rely on that, that shell itself will expand regex into file names. On Windows, it's even harder. Sample clang-format call, for c++ header and source files, would look e.g. like this:
clang-format -i -r .\.cpp$ .\.hpp$

This patch provides strict regex matching, so such call won't work:
'clang-format -i -r *.cpp'

Next thing to implement could be some kind of 'easy regexs' or so, which would accept '*.cpp', but IMHO this isn't in the scope of this patch, so I didn't implement it.

I didn't write any test or so. Before, I wanted to write this mail and get feedback if -r option is even a thing and you will like it. After your acceptance (if it'll be an acceptance), I'll write tests, update documentation etc.

If you have questions/comments/accepts/rejects feel free (:

P.S. I don't know why there is so many changes in lines that I didn't even touched. I've just ran clang-format -i -style=file to format this file.

Diff Detail

Repository
rL LLVM

Event Timeline

stryku created this revision.Jan 23 2017, 9:50 AM
rsmith edited reviewers, added: djasper; removed: rsmith.Jan 23 2017, 11:17 AM
rsmith added a subscriber: cfe-commits.
djasper edited edge metadata.Jan 23 2017, 3:16 PM

I am happy to let other people in the community weigh in, but I would not move forward with this patch. Listing directories is not a task that clang-format should do. It does not seem useful to me to add this functionality to basically every single tool that you might want to pipe files through. That would mean that you'd have to repeat the same code many times. Also, I would not call this "bash magic", these are more like "bash basics". I don't know how much harder this is on windows, but still solving that doesn't sound like something clang-format should fix.

The other point I'd like to make is that it rarely makes sense to format an entire directory structure with clang-format. It is actually much more useful if integrated into editors (plugins available) or the version control system (e.g. through git-clang-format). That's also why you see many diffs here, we don't reformat all of clang-format on every commit, just the changed lines. The rest might go sightly out of sync with clang-format's formatting, but that's ok, no harm done. Often this is preferable over lots of spurious diffs as present here.

@djasper You're probably right. Thanks anyway for your time (:

No problem :)

djasper resigned from this revision.Jan 31 2017, 3:57 AM
MyDeveloperDay added a project: Restricted Project.Wed, Sep 25, 1:50 AM
Herald added a project: Restricted Project. · View Herald TranscriptWed, Sep 25, 1:50 AM
MyDeveloperDay added a comment.EditedSun, Oct 6, 3:10 AM

When I first saw this issue some time ago I tended to agree with the comment regarding the "its not clang-formats" job to do that...

However, over the last couple of months, I've seen more and more twitter and stack-overflow comments regarding "how do I check my whole tree to see what needs clang formatting"

The solutions seem to be a mixture of using various shell tools, like find and grep depending if you are using cmd, bash, make etc.. and to determine if a change is indeed needed a quite convoluted series of commands looking for grepping for "<replacements>"

Finally, a quote on twitter by @invalidop made me want to change the way I use clang-format.

If it's not formatted with clang-format it's a build error.

When I'm making a change to clang-format itself, one thing I like to do to test the change is to ensure I didn't cause a huge wave of changes, what I want to do is simply run this on a known formatted directory and see if any new differences arrive.

This started me thinking that we should allow build systems to run clang-format on a whole tree and emit compiler style warnings about files that fail clang-format in a form that would make them as warning in most build systems and because those build systems range in their construction I don't think its unreasonable to NOT expect them to have to do the directory searching or parsing the output replacements themselves, but simply transform that into an error code when there are changes required.

To this end, I would like to resurrect these ideas of searching a tree, along with another idea I have regarding adding a -n or -dry-run command line argument which would emit a warning/error of the form

fileXXX.XXX:29:3: warning: file requires a clang-format [-Wfile-needs-clang-format]

I envisage being able to use -Wno-file-needs-clang-format and -Werror to escalate them to errors. I know for my own team we'd use this to scan a directory during a build, bring clang-format failures to the front, and this would prevent clang-formatting errors creeping in to the build which seems to happen from the few people who don't use it integrated into an editor or those who forget to run git-clang-format.

my question to you @stryku is are you still interested in pursuing this patch, or are you ok with us consuming it (at least partially) into another piece of work?

Thanks in advance

MyDeveloperDay retitled this revision from Proposal for clang-format -r option to [clang-format] Proposal for clang-format -r option.Sun, Oct 6, 3:22 AM
MyDeveloperDay removed a project: Restricted Project.
MyDeveloperDay removed a subscriber: llvm-commits.
Herald added a project: Restricted Project. · View Herald TranscriptSun, Oct 6, 3:22 AM

I agree with @djasper that this is outside the scope of clang-format. git for Windows gives you a full set of bash utilities to utilize, so doing stuff like this on Windows is much easier now.

tools/clang-format/ClangFormat.cpp
345

I could imagine the path you supplying being just a directory . or ../../include/Format as much as a wild card, once in that directory I might want to you clang-format all the file types clang-format supports or just a few? how could we do that?

it would be nice if we could supply multiple directories on the command line

clang-format -r -i  . .. ../include/Format

I wonder if the wild card should act more like --gtest_filter

--gtest_filter=POSTIVE_PATTERNS[-NEGATIVE_PATTERNS]

or find's -iregex?

In Windows you just write a Python script to do this, and this works everywhere, not just on one platform or the other, so bash isn't even necessary and Python is easy to write so I wouldn't really say it's "even harder". If you google for run-clang-format.py you'll find a script that actually branches out and does this in parallel. There's a lot of logic and smarts you could bake into an external tool which can then be used for many different programs, not just clang-format.

In Windows you just write a Python script to do this, and this works everywhere, not just on one platform or the other, so bash isn't even necessary and Python is easy to write so I wouldn't really say it's "even harder". If you google for run-clang-format.py you'll find a script that actually branches out and does this in parallel. There's a lot of logic and smarts you could bake into an external tool which can then be used for many different programs, not just clang-format.

@zturner thanks for the feedback, I took a look and run-clang-format.py at (https://github.com/Sarcasm/run-clang-format), and whilst I have a reasonable knowledge of python, reading the python code just makes me ask why isn't all of this just in clang-format in the first place?

To me I think these are unaddressed clang-format requirements that other third-party developers are providing, without having to conform to some agreed upstream set of rules or gaining consensus. The author has coded it the way they want, how they want and can add to it to do what they want (good on them). But that doesn't mean that's the only approach or best choice of technology, or that someone should follow the same model or develop something similar. Its one option, a good option, but not the only option.

I've personally been burnt on Windows using a combination of python in both the python windows distribution and cygwin together, the windows version doesn't understand Cygwin paths (as expected). I know I do my own development in clang with 1 Visual Studio Command Prompt setup to pick up VS2019 and one bash prompt to ensure it picks up python correctly, (I don't use CMake to make visual studio projects but use the Code Blocks CMake options as that works nicely with cygwin, allowing me to do a make -jN which is essential, the visual studio projects are pretty unworkable)

I have to build in the windows shell so it finds the correct compiler cl.exe and not the cygwin's gcc, but I can't run git clang-format or git llvm push in those windows shells as the python breaks. and that all comes down to the interaction of different pythons. (and I just cannot justify to myself proliferating that pain to others by using python as a solution when we have a perfectly good solution in C++) . Ask yourself how many people have trouble getting the lit tests to work in LLVM on windows I know I do. Which I'm sure is resolved by some configuration magic but I haven't found it yet.

But I am grateful for the evidence that needing this capability is obviously required (otherwise someone wouldn't have written this script)

if we take a look at the script what is it doing... the script is ~ 350 lines

  1. A certain section is covering recursive running (about 20-40 lines)
  2. A reasonable amount is simply handling the command line arguments (30-50 lines)
  3. About 40 lines is colourizing a diff
  4. And about 70-100 lines is handling running in parallel

The command-line options it supports just:

--clang-format-executable (to tell it where to find the executable)
-extensions to tell it what to file types to handle (which actually their list is out of date as it only covers some C++ code and not js,C#,protobuf etc)
-r for recursive
-q for quiet
-j for the number of parallel jobs
-color for showing a colored diff
--exclude for paths to exclude.

Of that which parts are useful? the colored diff? maybe.. the parallelizm is nice but that's not essential in my view on a fast machine and there are other ways of doing that especially in makefiles which are already running -j N maybe you could even use gnu parallel for that?

So basically we are down to --extensions, -r, --exclude ... as perhaps the real benefit, I think its this functionality that this revision is/should be adding. I think its relatively simple and is already partially covered in about 30+ lines above.

But the existence of this script just means I'm even more sold on the fact that we should do it..so I really appreciate you bringing that to my attention.