This is an archive of the discontinued LLVM Phabricator instance.

[git-clang-format]: New option to perform formatting against staged changes only
Needs ReviewPublic

Authored by Alexander-Shukaev on Dec 11 2015, 1:53 PM.

Details

Summary

Imagine a pre-commit hook that should strictly prevent committing incorrectly formatted code. A scrap of it could look as follows:

clang_format() {
  configuration="${PWD}/.clang-format"
  if [ -r "${configuration}" ]; then
    if [ -n "$(parse --verify 'HEAD' 2> '/dev/null')" ]; then
      commit='HEAD'
    else
      commit='4b825dc642cb6eb9a060e54bf8d69288fbee4904'
    fi
    extensions=...
    git clang-format --staged                                                  \
                     --commit="${commit}"                                      \
                     --extensions="${extensions}"                              \
                     --style='file'                                            \
                     "${@}"
  else
    die '%s: %s: %s'                                                           \
        'clang-format'                                                         \
        'Configuration file not found'                                         \
        "${configuration}"
  fi
}
...
set -e
diff="$(clang_format --diff)"
set +e
if [ "${diff}" != 'no modified files to format'           ] &&                 \
   [ "${diff}" != 'clang-format did not modify any files' ]; then
  error="$(clang_format 2>&1 > /dev/null)"
  die "%s: %s\n%s${error:+\n}%s"                                               \
      'clang-format'                                                           \
      'Code style issues detected'                                             \
      "${error}"                                                               \
      "${diff}"
fi

Here is what happens without the new --staged option:

  • Stage badly formatted source file;
  • git commit;
  • The above pre-commit hook will:
    • report error (abort commit),
    • reformat code accordingly,
    • and print the corresponding diff;
  • Don't stage the newly formatted changes (which correspond to the correct expected formatting to be committed);
  • git commit (success).

Notice how commit succeeds and git-clang-format does not detect wrong formatting anymore. It happens because along with staged changes it also considers not staged changes, including the ones that it just has made itself. Thus, it thinks everything is OK with the source code and lets one commit it while part of this correct formatting was not even staged for the commit. Clearly, this is unlikely to be the behavior one would expect in this use case.

NOTE:
--force option has nothing to do with this problem.

With the new --staged option, git-clang-format is instructed to consider only staged changes when performing formatting or producing diffs. As a result, no matter how many times one issues git commit, they all will be aborted until one stages the correctly formatted code.

Diff Detail

Repository
rL LLVM

Event Timeline

Alexander-Shukaev retitled this revision from to New 'git-clang-format' option to perform formatting against staged changes only.
Alexander-Shukaev updated this object.
Alexander-Shukaev added a reviewer: djasper.
Alexander-Shukaev set the repository for this revision to rL LLVM.
Alexander-Shukaev changed the edit policy from "All Users" to "Administrators".
Alexander-Shukaev added a subscriber: djasper.

By no means I want to smell like an annoying bumper here, but just out of curiosity, is there any particular reason why the review on the patch is being delayed or reviewers are merely overloaded with other issues? Maybe I could add other reviewers if that's applicable?

After applying your patch, when I run:

$ git clang-format --staged

I get:

Traceback (most recent call last):
  File "/usr/local/bin/git-clang-format", line 511, in <module>
    main()
  File "/usr/local/bin/git-clang-format", line 147, in main
    old_tree = run_git_ls_files_and_save_to_tree(changed_lines)
  File "/usr/local/bin/git-clang-format", line 343, in run_git_ls_files_and_save_to_tree
    return create_tree(index_info_generator(), '--index-info')
  File "/usr/local/bin/git-clang-format", line 371, in create_tree
    for line in input_lines:
  File "/usr/local/bin/git-clang-format", line 333, in index_info_generator
    os.environ['GIT_INDEX_FILE'] = index_path
  File "/usr/lib/python2.7/os.py", line 471, in __setitem__
    putenv(key, item)
TypeError: must be string, not None

Any ideas?

  1. Fixed exception.
  2. Ensure to use the '--cached' option for diff only when the '--staged' option is supplied (just for logical correctness in general, but makes no real difference for 'git-clang-format' in particular).

The patch was updated accordingly. Please, give it one more shot. Thanks.

Can somebody, please, review the patch and provide feedback? It's been almost month since submission. I can add more reviewers if somebody points me to them.

Alexander-Shukaev retitled this revision from New 'git-clang-format' option to perform formatting against staged changes only to [git-clang-format]: New option to perform formatting against staged changes only.Feb 8 2016, 9:14 AM
Alexander-Shukaev added a subscriber: cfe-commits.
lodato requested changes to this revision.Feb 9 2016, 8:58 AM
lodato added a reviewer: lodato.
lodato added a subscriber: lodato.

This does not work properly because it calls clang-format on the files in the working directory, even if --staged is given. To fix, you need to somehow pass in the version of the files in the index into clang-format. To do that, I think you'd want to pass in the blob via stdin and add -assume-filename=... to the command line.

Example:

$ mkdir tmp
$ cd tmp
$ git init
$ cat > foo.cc <<EOF
int main() {
  int x = 1;
  return 0;
}
EOF
$ git add foo.cc
$ git commit -m 'initial commit'
$ sed -i -e 's/1/2/g' foo.cc
$ git add foo.cc
$ rm foo.cc
$ cat > foo.cc <<EOF
int main() {
  printf("hello\n");
  printf("goodbye\n");
  return 0;
}
EOF

Now we have the situation where the stage and the working directory differ, but both are clang-format-compatible and should produce no changes.

$ git diff --cached
diff --git c/foo.cc i/foo.cc
index cb7e0b0..0a5833c 100644
--- c/foo.cc
+++ i/foo.cc
@@ -1,4 +1,4 @@
 int main() {
-  int x = 1;
+  int x = 2;
   return 0;
 }
$ git diff
diff --git i/foo.cc w/foo.cc
index 0a5833c..b821f3e 100644
--- i/foo.cc
+++ w/foo.cc
@@ -1,4 +1,5 @@
 int main() {
-  int x = 2;
+  printf("hello\n");
+  printf("goodbye\n");
   return 0;
 }

However, your script produces the following, which is clearly incorrect:

$ git-clang-format --staged --diff
diff --git a/foo.cc b/foo.cc
index 0a5833c..b821f3e 100644
--- a/foo.cc
+++ b/foo.cc
@@ -1,4 +1,5 @@
 int main() {
-  int x = 2;
+  printf("hello\n");
+  printf("goodbye\n");
   return 0;
 }
git-clang-format
104

Please move this down after -q so it stays sorted.

Also I would drop the -s since the this will be an uncommon option.

Finally, could you add aliases=['cached'] to mirror the behavior of git diff?

124

This will work without --diff (otherwise it will try to apply changes in the index to the working directory, which doesn't make sense), so could you please add a check that --staged requires --diff?

250

I suggest sticking with the name staged throughout the code.

328

There is already a git command that does this, git write-tree, so you can replace this entire function with a call to run('git', 'write-tree').

This revision now requires changes to proceed.Feb 9 2016, 8:58 AM
lodato added inline comments.Feb 9 2016, 9:05 AM
git-clang-format
124

Oops, will not work.

nevion added a subscriber: nevion.Aug 23 2017, 5:54 PM

Hi - I'm coming here from Alexander's post on stackoverflow and I'm interested to see both how this solution is progressing (no replies for 1.5 years is not a good sign), verify it still works, and to check if there are better options available hedging if this is stuck in limbo and out of usability date.

Man, I have to admit it's really a shame that I didn't find time to work on this further but I'm truly too busy these days. However, I believe the primary point why I didn't have motivation to do this is because the flaw that was pointed out actually never bothered me in real life simply because I've never ever hit this case in production. I confess that I use this solution, namely the one linked on Stack Overflow (to my personal Bitbucket repository) every single day and we even introduced it in my company. Nobody ever complained so far. It's true that sometimes one would want to stage only some changes and commit them, while having other changes unstaged, but I don't remember when I was in that situation last time. If you really want to leave something out for another commit, then you can also stash those unstaged changes before you format/commit the staged ones. Furthermore, let me clarify why the proposal by @lodato might not even fit into the picture (i.e. there is no universal solution this problem as I see it until now). In particular, his example does not illustrate another side of the medal, namely let's say that the staged code was, in fact, badly formatted (not like in his example), and then you apply some code on top of it that is not yet staged (same like in his example). By "on top" I mean really like he shows it, that those changes overlap, that is if you'd stage them, they'd overwrite the previously staged ones (which in our imaginary example are badly formatted now). Now let's think what will happen if we follow his proposal. We'd apply formatting purely to the "staged" version of the file by piping it from index as a blob directly to formatter, so far so good. Or wait a minute, how would you actually format that file in place then? That is you already have unstaged and potentially conflicting changes with the ones you'd get as a result of formatting the staged ones but how to reconcile these two versions now? That is how do you put those formatted changes into unstaged state when you already have something completely different but also unstaged at the same line? Turns out that the answer is, you can't, without introducing explicit conflicts in the unstaged tree, which is even more confusing to my taste. Or would you just report the diff with an error message to the user leaving the act of applying those to them manually? You could, but then you give up on that cool feature of automatic formatting. To conclude, which approach you take, is all about pros and cons. On the daily basis and from productivity standpoint, I care more about doing my changes for the target commit, trying to commit, if something is wrong getting it automatically formatted in the unstaged tree, reviewing this unstaged diff quickly, staging it, and finally committing my work. That corner case with some irrelevant changes hanging in the unstaged tree and fooling formatter can rarely be a problem. And even then, I treat it more like an answer from a formatter: "Hey bro, you have some unstaged crap out there, can you please already decide whether you need it now or not, otherwise I can't do my job for you?!", in which case I will

  • either actually stage them if I really need them,
  • or stash them if I want to deal with them later,
  • or discard them altogether if they are garbage,

all of which will allow formatter to do it's job and me to finally produce the commit.

nevion added a comment.EditedAug 25 2017, 3:40 PM

Can anyone inform me why these don't work on clang-format-diff?

Further - I feel it should be simple to make this work properly like git-diff, although I do think legato's case is off the norm and prevents a usable solution, I can't understand why the bug/issue even arises, nor what the fix is.. but it seems like it should be a simple thing.

Alexander-Shukaev edited edge metadata.

Alright, so you got me excited about this task once again. Now, I've just rebased to the latest git-clang-format and it has changed a bit. Nevertheless, I've updated the previous patch accordingly and applied those changes which @lodato was proposing (except for getting rid of run_git_ls_files_and_save_to_tree as I'm still not sure whether this would be the same as git write-tree). Anyway, just tested the exact scenario posted by @lodato, and indeed with the current patch it works as he expects. Namely, there is no diff from unstaged tree interfering anymore. So far so good, but as I said in my previous post, there is another side of the medal. If you didn't get it, then read through it again and try, for example, the following amended @lodato's scenario:

$ mkdir tmp
$ cd tmp
$ git init
$ cat > foo.cc <<EOF
int main() {
  int x = 1;
  return 0;
}
EOF
$ git add foo.cc
$ git commit -m 'initial commit'
$ sed -i -e 's/1/\n2\n/g' foo.cc
$ git add foo.cc
$ rm foo.cc
$ cat > foo.cc <<EOF
int main() {
  printf("hello\n");
  printf("goodbye\n");
  return 0;
}
EOF

Notice the newlines around 2. If you now do git commit, those bad newlines in the staged tree will for sure get fixed but at the cost that you will lose your changes in unstaged tree without any notification. That is as I said, they will merely be overwritten by Clang-Format. There are several possibilities to improve the implementation from here that I can think of:

  1. Introduce an explicit conflict in unstaged tree to have both previous unstaged changes and changes done by Clang-Format (to reformat the staged tree) for manual resolution by users;
  2. Report an error with a diff of those unstaged lines which prevent Clang-Format from doing it's job due to conflict, which is technically speaking already the same as it was before that patch;

Finally, regardless of what would be the choice between these two options, I'd reuse the --force option to forcefully overwrite the unstaged tree with the changes done by Clang-Format, essentially what the current patch does already anyway. What do you guys think? Contributions and suggestions are very welcome; let's have another round to get it merged upstream!

Did anybody have a chance to review it and/or try it out?

I'm paying attention at least. I updated your patch prior to your posting and temporarily made due with it. I'm pretty nervous that I will lose work with my commit style and the lingering issue. Based on what I've seen so far I can't use the git hooks and so I want to do this more manually which entices more accidents as that's when staged data is more frequent.

Sorry, I have been very busy with other work so I haven't had a chance to follow along. (I don't work on LLVM team - I just contributed this script.)

I'll try to carve out some time to review within the next week.

Mark, just wanted to check if the review is still somewhere on your radar.

I think the simplest solution to those problems is to require --diff. An alternative is to write the changes directly to the index without touching the working directory, but that would require some flag because the behavior is unintuitive, and the implementation would be complicated enough to warrant its own patch.

I reimplemented your patch in D41147 based on a significant refactoring, which I hope makes the code more clear.

Oh, and I meant to start with: I'm so sorry for the extremely long delay. I was swamped with work before then I forgot about this. Please know that I appreciate your effort here and that I didn't mean to blow you off.

Best regards, Mark