This is an archive of the discontinued LLVM Phabricator instance.

clang-format: Add an option to git-clang-format to diff between to commits
ClosedPublic

Authored by lhchavez on Sep 7 2016, 3:09 PM.

Details

Summary

When building pre-upload hooks using git-clang-format, it is useful to limit the scope to a diff of two commits (instead of from a commit against the working tree) to allow for less false positives in dependent commits.

This change adds the option of specifying two git commits to git-clang-format when using the --diff flag, which uses a different strategy to diff (using git-diff-tree instead of git-diff-index), and runs clang-format against the second commit instead of the working directory.

There is a slight backwards-incompatibility introduced with this change: if a filename matches a branch name or other commit-ish, then git clang-format <commit> <file> will no longer work as expected; use git clang-format <commit> -- <file> instead.

Diff Detail

Event Timeline

lhchavez updated this revision to Diff 70607.Sep 7 2016, 3:09 PM
lhchavez retitled this revision from to clang-format: Add a flag to limit git-clang-format's diff to a single commit.
lhchavez updated this object.
lhchavez added a reviewer: djasper.
lhchavez set the repository for this revision to rL LLVM.
lhchavez added a project: Restricted Project.
lhchavez added subscribers: srhines, cfe-commits.
lodato added a subscriber: lodato.Sep 12 2016, 3:46 PM

Hi lhchavez,

This patch does not work as intended. If I understand correctly, you want to see if a given <commit> has any clang-format problems. However:

  1. This patch only computes changed_lines from <commit> but then runs clang-format on the files in the working directory. Instead, you need to somehow update run_clang_format_and_save_to_tree() to operate on <commit>.
  1. Unless --diff is given, this new mode writes clang-formatted contents of <commit> to the working directory, blowing away any changes since then. The simple solution is to require --diff whenever this new mode is used, since it's not obvious what should be written to the working directory.

See also D15465, which had similar problems.

Here is an example showing the problem. The last command should have said that aa207 was bad, but since the current working directory is fine, it shows no diff.

$ git log -p --reverse
commit a2765ba80f03f02aaa0cefbfe705ae844c219065
Author: ...
Date:   2016-09-12 16:29:53 -0400

    initial commit; x and y bad

diff --git a/foo.cc b/foo.cc
new file mode 100644
index 0000000..08422a0
--- /dev/null
+++ b/foo.cc
@@ -0,0 +1,4 @@
+void foo() {
+  int x=1;
+  int y=2;
+}

commit aa207f7991d26d1ee6826bf272f8eefffdf31b31
Author: ...
Date:   2016-09-12 16:30:19 -0400

    modify x, still bad

diff --git a/foo.cc b/foo.cc
index 08422a0..dd75f40 100644
--- a/foo.cc
+++ b/foo.cc
@@ -1,4 +1,4 @@
 void foo() {
-  int x=1;
+  int x=3;
   int y=2;
 }

commit 72858d5d2b13ed9dcf6a3328fef43a81ee3f01fc
Author: ...
Date:   2016-09-12 16:30:58 -0400

    modify both lines, still bad

diff --git a/foo.cc b/foo.cc
index dd75f40..3eff9ca 100644
--- a/foo.cc
+++ b/foo.cc
@@ -1,4 +1,4 @@
 void foo() {
-  int x=3;
-  int y=2;
+  int x=30;
+  int y=20;
 }

commit 9e6bfa1b9582b1423efc6d2339e269636fa0718b
Author: ...
Date:   2016-09-12 16:32:21 -0400

    clang-format

diff --git a/foo.cc b/foo.cc
index 3eff9ca..ac4d00d 100644
--- a/foo.cc
+++ b/foo.cc
@@ -1,4 +1,4 @@
 void foo() {
-  int x=30;
-  int y=20;
+  int x = 30;
+  int y = 20;
 }
$ ~/tmp/git-clang-format --single-commit aa207
clang-format did not modify any files
cfe/trunk/tools/clang-format/git-clang-format
93 ↗(On Diff #70607)

nit: I find this flag confusing since it does not follow any git convention. Instead, I suggest making the interface similar to git diff: if two <commit>s are given, take the diff of those two trees to find the list of changed lines, then run clang-format on the second commit.

For example:

  • git clang-format --diff HEAD HEAD~ would tell you if HEAD was properly clang-formatted or not.
  • git clang-format --diff 8bf003 ae9172 would tell you if any of the lines in ae9172 that differ from 8bf003 are not properly clang-formatted.

    operate in this new mode only if two <commit>s are given. Then the interface would be, for example, git clang-format abcd1234 abcd1234~.
323 ↗(On Diff #70607)

Unless I'm mistaken, this function is unnecessary. There is no need to filter out files in the tree that do not match the pattern, since the filtering happens in compute_diff() (cmd.extend(files)).

I'll post a no-op change with arcanist to fix the paths (I shouldn't have tried to manually upload the diff), and then another one to fix the patch so that the script actually does what it advertises and add the comment explaining the reason for create_tree_from_commit().

I'll tackle the interface change in the change following that one, after some more experimentation.

cfe/trunk/tools/clang-format/git-clang-format
93 ↗(On Diff #70607)

The thing I liked about using git-diff-tree is that you can pass a single commit and even if it's a merge-commit, it does the right thing. I'll try that idea out and allow more than two commits and see how it behaves.

323 ↗(On Diff #70607)

The reason I added this is to prevent the generation of much larger temporary indices for large projects (like Chromium) vs. git-clang-format in normal mode. I can add a comment explaining that.

lhchavez updated this revision to Diff 71095.Sep 12 2016, 6:55 PM

No-op update. Using arcanist to fix the diff and the paths.

lhchavez updated this revision to Diff 71097.Sep 12 2016, 6:58 PM

Fix the script so it does what it advertises.

lhchavez updated this revision to Diff 71103.Sep 12 2016, 9:14 PM

Using lodato's proposed interface. This patch:

  • Accepts an arbitrary number of commits as arguments. Validation will be done in main(), such that two commits are valid only when running in --diff mode.
  • Allows diffing two commits against each other in addition to diffing against the working directory (still using git diff-tree). The previous version didn't really work with merge-commits anyways.
  • Runs clang-format against the file in the working directory (if a single commit is passed), or against the second commit.
lhchavez marked 3 inline comments as done.Sep 12 2016, 9:17 PM
lhchavez added inline comments.
cfe/trunk/tools/clang-format/git-clang-format
93 ↗(On Diff #70607)

Nevermind, git diff-tree didn't work with merge-commits if a single commit is passed. Changed to your proposal.

lhchavez marked 2 inline comments as done.Sep 12 2016, 11:30 PM
lhchavez added inline comments.
cfe/trunk/tools/clang-format/git-clang-format
323 ↗(On Diff #70607)

Wait, I understand what you mean now. I can just extract the tree from the commit and use that as diff base. Changed it to a simpler git show command.

lhchavez updated this revision to Diff 71111.Sep 12 2016, 11:30 PM
lhchavez marked an inline comment as done.

Got rid of create_tree_from_commit

lhchavez retitled this revision from clang-format: Add a flag to limit git-clang-format's diff to a single commit to clang-format: Add an option to git-clang-format to diff between to commits.Sep 12 2016, 11:41 PM
lhchavez updated this object.

Gentle ping. Is there anything else that needs addressing? Did I miss anything?

Sorry for the delay - I haven't had a chance to review. I'll be sure to review by tomorrow. Thanks for the updates!

Could you add a note to the commit description to say that there is a backwards incompatibility: if a filename matches a branch name or other commit-ish, then git clang-format <commit> <file> will no longer work as expected; use git clang-format <commit> -- <file> instead.

tools/clang-format/git-clang-format
35

add a second [<commit>]

38–40

suggestion:

If zero or one commits are given, <old text>.

If two commits are given (requires --diff), run clang-format on all lines in the second <commit> that differ from the first <commit>.

130

This error message is a bit confusing. I think it would be clearer to do:

if len(commits) > 1:
  if not opts.diff:
    die('--diff is required when two commits are given')
elif len(commits) > 2:
  die('at most two ...
191

nit: Python style is to put the whole thing on a single line. Maybe remove the two ...s or the [--]?

323

I don't think this function is necessary. All git commands that take a tree actually take a "tree-ish", which is a tree, commit, tag, or branch. I just tried removing this function and it appeared to work.

(If it ends up this function is necessary, I think the proper thing to do is git rev-parse <commit>^{tree} since git-show is a porcelain command.)

391
Runs on the file in `revision` if not None, or on the file in the working directory if `revision` is None.
400

Should use cat-file instead of show - the former is plumbing while the latter is porcelain (see man git). That is, git cat-file blob %s:%s.

454

Add --diff-filter=M. Without this, the command prints all unmodified files as deleted since old_tree has all the files in the case of multi-commit mode, but new_tree only has modified files. I suggest adding a comment explaining this.

To test, try running git clang-format --diff HEAD~30 HEAD on the LLVM repo. You'll see that without the diff filter, you get a 100MB+ diff! :)

462

Might as well add --diff-filter=M here too

lhchavez updated this revision to Diff 72064.Sep 21 2016, 8:57 AM

Addressed lovato's comments

lhchavez updated this object.Sep 21 2016, 8:59 AM
lhchavez updated this object.
lhchavez marked 9 inline comments as done.Sep 21 2016, 9:01 AM

Addressed all comments. PTAAL.

lodato accepted this revision.Sep 21 2016, 2:15 PM
lodato added a reviewer: lodato.

Nice feature. Thanks for the patch!

This revision is now accepted and ready to land.Sep 21 2016, 2:15 PM
This revision was automatically updated to reflect the committed changes.