This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

4
git-clang-format

Differential D15465

[git-clang-format]: New option to perform formatting against staged changes only
Needs ReviewPublic

Authored by Alexander-Shukaev on Dec 11 2015, 1:53 PM.

Download Raw Diff

Details

Reviewers

klimek
djasper
DJKessler
lodato

Summary

Imagine a pre-commit hook that should strictly prevent committing incorrectly formatted code. A scrap of it could look as follows:

clang_format() {
  configuration="${PWD}/.clang-format"
  if [ -r "${configuration}" ]; then
    if [ -n "$(parse --verify 'HEAD' 2> '/dev/null')" ]; then
      commit='HEAD'
    else
      commit='4b825dc642cb6eb9a060e54bf8d69288fbee4904'
    fi
    extensions=...
    git clang-format --staged                                                  \
                     --commit="${commit}"                                      \
                     --extensions="${extensions}"                              \
                     --style='file'                                            \
                     "${@}"
  else
    die '%s: %s: %s'                                                           \
        'clang-format'                                                         \
        'Configuration file not found'                                         \
        "${configuration}"
  fi
}
...
set -e
diff="$(clang_format --diff)"
set +e
if [ "${diff}" != 'no modified files to format'           ] &&                 \
   [ "${diff}" != 'clang-format did not modify any files' ]; then
  error="$(clang_format 2>&1 > /dev/null)"
  die "%s: %s\n%s${error:+\n}%s"                                               \
      'clang-format'                                                           \
      'Code style issues detected'                                             \
      "${error}"                                                               \
      "${diff}"
fi

Here is what happens without the new --staged option:

Stage badly formatted source file;
git commit;
The above pre-commit hook will:
- report error (abort commit),
- reformat code accordingly,
- and print the corresponding diff;
Don't stage the newly formatted changes (which correspond to the correct expected formatting to be committed);
git commit (success).

Notice how commit succeeds and git-clang-format does not detect wrong formatting anymore. It happens because along with staged changes it also considers not staged changes, including the ones that it just has made itself. Thus, it thinks everything is OK with the source code and lets one commit it while part of this correct formatting was not even staged for the commit. Clearly, this is unlikely to be the behavior one would expect in this use case.

NOTE:
--force option has nothing to do with this problem.

With the new --staged option, git-clang-format is instructed to consider only staged changes when performing formatting or producing diffs. As a result, no matter how many times one issues git commit, they all will be aborted until one stages the correctly formatted code.

Diff Detail

Repository: rL LLVM

Event Timeline

Alexander-Shukaev updated this revision to Diff 42564.Dec 11 2015, 1:53 PM

Alexander-Shukaev retitled this revision from to New 'git-clang-format' option to perform formatting against staged changes only.

Alexander-Shukaev updated this object.

Alexander-Shukaev added a reviewer: djasper.

Alexander-Shukaev set the repository for this revision to rL LLVM.

Alexander-Shukaev changed the edit policy from "All Users" to "Administrators".

Alexander-Shukaev added a subscriber: djasper.

Alexander-Shukaev updated this object.Dec 11 2015, 1:58 PM

By no means I want to smell like an annoying bumper here, but just out of curiosity, is there any particular reason why the review on the patch is being delayed or reviewers are merely overloaded with other issues? Maybe I could add other reviewers if that's applicable?

After applying your patch, when I run:

$ git clang-format --staged

I get:

Traceback (most recent call last):
  File "/usr/local/bin/git-clang-format", line 511, in <module>
    main()
  File "/usr/local/bin/git-clang-format", line 147, in main
    old_tree = run_git_ls_files_and_save_to_tree(changed_lines)
  File "/usr/local/bin/git-clang-format", line 343, in run_git_ls_files_and_save_to_tree
    return create_tree(index_info_generator(), '--index-info')
  File "/usr/local/bin/git-clang-format", line 371, in create_tree
    for line in input_lines:
  File "/usr/local/bin/git-clang-format", line 333, in index_info_generator
    os.environ['GIT_INDEX_FILE'] = index_path
  File "/usr/lib/python2.7/os.py", line 471, in __setitem__
    putenv(key, item)
TypeError: must be string, not None

Any ideas?

Fixed exception.
Ensure to use the '--cached' option for diff only when the '--staged' option is supplied (just for logical correctness in general, but makes no real difference for 'git-clang-format' in particular).

The patch was updated accordingly. Please, give it one more shot. Thanks.

Can somebody, please, review the patch and provide feedback? It's been almost month since submission. I can add more reviewers if somebody points me to them.

Alexander-Shukaev added reviewers: DJKessler, klimek.Jan 13 2016, 9:26 AM

Alexander-Shukaev added a subscriber: klimek.

Alexander-Shukaev retitled this revision from New 'git-clang-format' option to perform formatting against staged changes only to [git-clang-format]: New option to perform formatting against staged changes only.Feb 8 2016, 9:14 AM

Alexander-Shukaev added a subscriber: cfe-commits.

This does not work properly because it calls clang-format on the files in the working directory, even if --staged is given. To fix, you need to somehow pass in the version of the files in the index into clang-format. To do that, I think you'd want to pass in the blob via stdin and add -assume-filename=... to the command line.

Example:

$ mkdir tmp
$ cd tmp
$ git init
$ cat > foo.cc <<EOF
int main() {
  int x = 1;
  return 0;
}
EOF
$ git add foo.cc
$ git commit -m 'initial commit'
$ sed -i -e 's/1/2/g' foo.cc
$ git add foo.cc
$ rm foo.cc
$ cat > foo.cc <<EOF
int main() {
  printf("hello\n");
  printf("goodbye\n");
  return 0;
}
EOF

Now we have the situation where the stage and the working directory differ, but both are clang-format-compatible and should produce no changes.

$ git diff --cached
diff --git c/foo.cc i/foo.cc
index cb7e0b0..0a5833c 100644
--- c/foo.cc
+++ i/foo.cc
@@ -1,4 +1,4 @@
 int main() {
-  int x = 1;
+  int x = 2;
   return 0;
 }
$ git diff
diff --git i/foo.cc w/foo.cc
index 0a5833c..b821f3e 100644
--- i/foo.cc
+++ w/foo.cc
@@ -1,4 +1,5 @@
 int main() {
-  int x = 2;
+  printf("hello\n");
+  printf("goodbye\n");
   return 0;
 }

However, your script produces the following, which is clearly incorrect:

$ git-clang-format --staged --diff
diff --git a/foo.cc b/foo.cc
index 0a5833c..b821f3e 100644
--- a/foo.cc
+++ b/foo.cc
@@ -1,4 +1,5 @@
 int main() {
-  int x = 2;
+  printf("hello\n");
+  printf("goodbye\n");
   return 0;
 }

git-clang-format
104	Please move this down after `-q` so it stays sorted. Also I would drop the `-s` since the this will be an uncommon option. Finally, could you add `aliases=['cached']` to mirror the behavior of `git diff`?
124	This will work without `--diff` (otherwise it will try to apply changes in the index to the working directory, which doesn't make sense), so could you please add a check that `--staged` requires `--diff`?
250	I suggest sticking with the name `staged` throughout the code.
328	There is already a git command that does this, `git write-tree`, so you can replace this entire function with a call to `run('git', 'write-tree')`.

This revision now requires changes to proceed.Feb 9 2016, 8:58 AM

lodato added inline comments.Feb 9 2016, 9:05 AM

git-clang-format
124	Oops, will not work.

Hi - I'm coming here from Alexander's post on stackoverflow and I'm interested to see both how this solution is progressing (no replies for 1.5 years is not a good sign), verify it still works, and to check if there are better options available hedging if this is stuck in limbo and out of usability date.

Man, I have to admit it's really a shame that I didn't find time to work on this further but I'm truly too busy these days. However, I believe the primary point why I didn't have motivation to do this is because the flaw that was pointed out actually never bothered me in real life simply because I've never ever hit this case in production. I confess that I use this solution, namely the one linked on Stack Overflow (to my personal Bitbucket repository) every single day and we even introduced it in my company. Nobody ever complained so far. It's true that sometimes one would want to stage only some changes and commit them, while having other changes unstaged, but I don't remember when I was in that situation last time. If you really want to leave something out for another commit, then you can also stash those unstaged changes before you format/commit the staged ones. Furthermore, let me clarify why the proposal by @lodato might not even fit into the picture (i.e. there is no universal solution this problem as I see it until now). In particular, his example does not illustrate another side of the medal, namely let's say that the staged code was, in fact, badly formatted (not like in his example), and then you apply some code on top of it that is not yet staged (same like in his example). By "on top" I mean really like he shows it, that those changes overlap, that is if you'd stage them, they'd overwrite the previously staged ones (which in our imaginary example are badly formatted now). Now let's think what will happen if we follow his proposal. We'd apply formatting purely to the "staged" version of the file by piping it from index as a blob directly to formatter, so far so good. Or wait a minute, how would you actually format that file in place then? That is you already have unstaged and potentially conflicting changes with the ones you'd get as a result of formatting the staged ones but how to reconcile these two versions now? That is how do you put those formatted changes into unstaged state when you already have something completely different but also unstaged at the same line? Turns out that the answer is, you can't, without introducing explicit conflicts in the unstaged tree, which is even more confusing to my taste. Or would you just report the diff with an error message to the user leaving the act of applying those to them manually? You could, but then you give up on that cool feature of automatic formatting. To conclude, which approach you take, is all about pros and cons. On the daily basis and from productivity standpoint, I care more about doing my changes for the target commit, trying to commit, if something is wrong getting it automatically formatted in the unstaged tree, reviewing this unstaged diff quickly, staging it, and finally committing my work. That corner case with some irrelevant changes hanging in the unstaged tree and fooling formatter can rarely be a problem. And even then, I treat it more like an answer from a formatter: "Hey bro, you have some unstaged crap out there, can you please already decide whether you need it now or not, otherwise I can't do my job for you?!", in which case I will

either actually stage them if I really need them,
or stash them if I want to deal with them later,
or discard them altogether if they are garbage,

all of which will allow formatter to do it's job and me to finally produce the commit.

Can anyone inform me why these don't work on clang-format-diff?

Further - I feel it should be simple to make this work properly like git-diff, although I do think legato's case is off the norm and prevents a usable solution, I can't understand why the bug/issue even arises, nor what the fix is.. but it seems like it should be a simple thing.

Alright, so you got me excited about this task once again. Now, I've just rebased to the latest git-clang-format and it has changed a bit. Nevertheless, I've updated the previous patch accordingly and applied those changes which @lodato was proposing (except for getting rid of run_git_ls_files_and_save_to_tree as I'm still not sure whether this would be the same as git write-tree). Anyway, just tested the exact scenario posted by @lodato, and indeed with the current patch it works as he expects. Namely, there is no diff from unstaged tree interfering anymore. So far so good, but as I said in my previous post, there is another side of the medal. If you didn't get it, then read through it again and try, for example, the following amended @lodato's scenario:

$ mkdir tmp
$ cd tmp
$ git init
$ cat > foo.cc <<EOF
int main() {
  int x = 1;
  return 0;
}
EOF
$ git add foo.cc
$ git commit -m 'initial commit'
$ sed -i -e 's/1/\n2\n/g' foo.cc
$ git add foo.cc
$ rm foo.cc
$ cat > foo.cc <<EOF
int main() {
  printf("hello\n");
  printf("goodbye\n");
  return 0;
}
EOF

Notice the newlines around 2. If you now do git commit, those bad newlines in the staged tree will for sure get fixed but at the cost that you will lose your changes in unstaged tree without any notification. That is as I said, they will merely be overwritten by Clang-Format. There are several possibilities to improve the implementation from here that I can think of:

Introduce an explicit conflict in unstaged tree to have both previous unstaged changes and changes done by Clang-Format (to reformat the staged tree) for manual resolution by users;
Report an error with a diff of those unstaged lines which prevent Clang-Format from doing it's job due to conflict, which is technically speaking already the same as it was before that patch;

Finally, regardless of what would be the choice between these two options, I'd reuse the --force option to forcefully overwrite the unstaged tree with the changes done by Clang-Format, essentially what the current patch does already anyway. What do you guys think? Contributions and suggestions are very welcome; let's have another round to get it merged upstream!

Did anybody have a chance to review it and/or try it out?

I'm paying attention at least. I updated your patch prior to your posting and temporarily made due with it. I'm pretty nervous that I will lose work with my commit style and the lingering issue. Based on what I've seen so far I can't use the git hooks and so I want to do this more manually which entices more accidents as that's when staged data is more frequent.

Sorry, I have been very busy with other work so I haven't had a chance to follow along. (I don't work on LLVM team - I just contributed this script.)

I'll try to carve out some time to review within the next week.

Hi @lodato, thanks mate.

Mark, just wanted to check if the review is still somewhere on your radar.

lodato mentioned this in D41147: git-clang-format: Add new --staged option..Dec 12 2017, 5:41 PM

I think the simplest solution to those problems is to require --diff. An alternative is to write the changes directly to the index without touching the working directory, but that would require some flag because the behavior is unintuitive, and the implementation would be complicated enough to warrant its own patch.

I reimplemented your patch in D41147 based on a significant refactoring, which I hope makes the code more clear.

Oh, and I meant to start with: I'm so sorry for the extremely long delay. I was swamped with work before then I forgot about this. Please know that I appreciate your effort here and that I didn't mean to blow you off.

Best regards, Mark

MyDeveloperDay mentioned this in D90996: [clang-format] Add --staged/--cached option to git-clang-format.Oct 21 2021, 1:07 AM

MyDeveloperDay mentioned this in rGbee61aa7b638: [clang-format] Add --staged/--cached option to git-clang-format.Oct 30 2021, 9:39 AM

Revision Contents

Path

Size

git-clang-format

96 lines

Diff 112801

git-clang-format

Context not available.
	help='select hunks interactively')	help='select hunks interactively')
	p.add_argument('-q', '--quiet', action='count', default=0,	p.add_argument('-q', '--quiet', action='count', default=0,
	help='print less information')	help='print less information')
		p.add_argument('--staged', '--cached', action='store_true',
		help='consider only staged lines')
	p.add_argument('--style',	p.add_argument('--style',
	default=config.get('clangformat.style', None),	default=config.get('clangformat.style', None),
	help='passed to clang-format'),	help='passed to clang-format'),
		lodatoUnsubmitted Not Done Reply Inline Actions This will work without `--diff` (otherwise it will try to apply changes in the index to the working directory, which doesn't make sense), so could you please add a check that `--staged` requires `--diff`? lodato: This will work without `--diff` (otherwise it will try to apply changes in the index to the…
		lodatoUnsubmitted Not Done Reply Inline Actions Oops, will not work. lodato: Oops, will //not// work.
Context not available.
	if len(commits) > 1:	if len(commits) > 1:
	if not opts.diff:	if not opts.diff:
	die('--diff is required when two commits are given')	die('--diff is required when two commits are given')
		if opts.staged:
		die('--staged is not allowed when two commits are given')
	else:	else:
	if len(commits) > 2:	if len(commits) > 2:
	die('at most two commits allowed; %d given' % len(commits))	die('at most two commits allowed; %d given' % len(commits))
	changed_lines = compute_diff_and_extract_lines(commits, files)	changed_lines = compute_diff_and_extract_lines(commits, files, opts.staged)
	if opts.verbose >= 1:	if opts.verbose >= 1:
	ignored_files = set(changed_lines)	ignored_files = set(changed_lines)
	filter_by_extension(changed_lines, opts.extensions.lower().split(','))	filter_by_extension(changed_lines, opts.extensions.lower().split(','))
Context not available.
	binary=opts.binary,	binary=opts.binary,
	style=opts.style)	style=opts.style)
	else:	else:
	old_tree = create_tree_from_workdir(changed_lines)	if opts.staged:
		old_tree = run_git_ls_files_and_save_to_tree(changed_lines)
		else:
		old_tree = create_tree_from_workdir(changed_lines)
	new_tree = run_clang_format_and_save_to_tree(changed_lines,	new_tree = run_clang_format_and_save_to_tree(changed_lines,
	binary=opts.binary,	binary=opts.binary,
	style=opts.style)	style=opts.style,
		staged=opts.staged)
	if opts.verbose >= 1:	if opts.verbose >= 1:
	print('old tree: %s' % old_tree)	print('old tree: %s' % old_tree)
	print('new tree: %s' % new_tree)	print('new tree: %s' % new_tree)
		lodatoUnsubmitted Not Done Reply Inline Actions I suggest sticking with the name `staged` throughout the code. lodato: I suggest sticking with the name `staged` throughout the code.
Context not available.
	return convert_string(stdout.strip())	return convert_string(stdout.strip())


	def compute_diff_and_extract_lines(commits, files):	def compute_diff_and_extract_lines(commits, files, staged=False):
	"""Calls compute_diff() followed by extract_lines()."""	"""Calls compute_diff() followed by extract_lines()."""
	diff_process = compute_diff(commits, files)	assert not staged or len(commits) < 2
		diff_process = compute_diff(commits, files, staged)
	changed_lines = extract_lines(diff_process.stdout)	changed_lines = extract_lines(diff_process.stdout)
	diff_process.stdout.close()	diff_process.stdout.close()
	diff_process.wait()	diff_process.wait()
Context not available.
	return changed_lines	return changed_lines


	def compute_diff(commits, files):	def compute_diff(commits, files, staged=False):
	"""Return a subprocess object producing the diff from `commits`.	"""Return a subprocess object producing the diff from `commits`.

	The return value's `stdin` file object will produce a patch with the	The return value's `stdin` file object will produce a patch with the
	differences between the working directory and the first commit if a single	differences between the working directory and the first commit if a single
	one was specified, or the difference between both specified commits, filtered	one was specified, or the difference between both specified commits, filtered
	on `files` (if non-empty). Zero context lines are used in the patch."""	on `files` (if non-empty). Zero context lines are used in the patch."""
	git_tool = 'diff-index'	assert not staged or len(commits) < 2
		cmd = ['git']
	if len(commits) > 1:	if len(commits) > 1:
	git_tool = 'diff-tree'	cmd.append('diff-tree')
	cmd = ['git', git_tool, '-p', '-U0'] + commits + ['--']	else:
		cmd.append('diff-index')
		if staged:
		cmd.append('--cached')
		cmd.extend(['-p', '-U0'] + commits + ['--'])
	cmd.extend(files)	cmd.extend(files)
	p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)	p = subprocess.Popen(cmd, stdin=subprocess.PIPE, stdout=subprocess.PIPE)
	p.stdin.close()	p.stdin.close()
		lodatoUnsubmitted Not Done Reply Inline Actions There is already a git command that does this, `git write-tree`, so you can replace this entire function with a call to `run('git', 'write-tree')`. lodato: There is already a git command that does this, `git write-tree`, so you can replace this entire…
Context not available.
	return create_tree(filenames, '--stdin')	return create_tree(filenames, '--stdin')


		def run_git_ls_files_and_save_to_tree(changed_lines):
		"""Run git ls-files and save the result to a git tree.

		Returns the object ID (SHA-1) of the created tree."""
		index_path = os.environ.get('GIT_INDEX_FILE')
		def iteritems(container):
		try:
		return container.iteritems() # Python 2
		except AttributeError:
		return container.items() # Python 3
		def index_info_generator():
		for filename, line_ranges in iteritems(changed_lines):
		old_index_path = os.environ.get('GIT_INDEX_FILE')
		if index_path is None:
		del os.environ['GIT_INDEX_FILE']
		else:
		os.environ['GIT_INDEX_FILE'] = index_path
		try:
		mode, id, stage, filename = run('git', 'ls-files', '--stage', '--',
		filename).split()
		yield '%s %s\t%s' % (mode, id, filename)
		finally:
		if old_index_path is None:
		del os.environ['GIT_INDEX_FILE']
		else:
		os.environ['GIT_INDEX_FILE'] = old_index_path
		return create_tree(index_info_generator(), '--index-info')


	def run_clang_format_and_save_to_tree(changed_lines, revision=None,	def run_clang_format_and_save_to_tree(changed_lines, revision=None,
	binary='clang-format', style=None):	binary='clang-format', style=None,
		staged=False):
	"""Run clang-format on each file and save the result to a git tree.	"""Run clang-format on each file and save the result to a git tree.

	Returns the object ID (SHA-1) of the created tree."""	Returns the object ID (SHA-1) of the created tree."""
		assert not staged or revision is None
		index_path = os.environ.get('GIT_INDEX_FILE')
	def iteritems(container):	def iteritems(container):
	try:	try:
	return container.iteritems() # Python 2	return container.iteritems() # Python 2
Context not available.
	# Adjust python3 octal format so that it matches what git expects	# Adjust python3 octal format so that it matches what git expects
	if mode.startswith('0o'):	if mode.startswith('0o'):
	mode = '0' + mode[2:]	mode = '0' + mode[2:]
	blob_id = clang_format_to_blob(filename, line_ranges,	old_index_path = os.environ.get('GIT_INDEX_FILE')
	revision=revision,	if index_path is None:
	binary=binary,	del os.environ['GIT_INDEX_FILE']
	style=style)	else:
	yield '%s %s\t%s' % (mode, blob_id, filename)	os.environ['GIT_INDEX_FILE'] = index_path
		try:
		blob_id = clang_format_to_blob(filename, line_ranges,
		revision=revision,
		binary=binary,
		style=style,
		staged=staged)
		yield '%s %s\t%s' % (mode, blob_id, filename)
		finally:
		if old_index_path is None:
		del os.environ['GIT_INDEX_FILE']
		else:
		os.environ['GIT_INDEX_FILE'] = old_index_path
	return create_tree(index_info_generator(), '--index-info')	return create_tree(index_info_generator(), '--index-info')


Context not available.


	def clang_format_to_blob(filename, line_ranges, revision=None,	def clang_format_to_blob(filename, line_ranges, revision=None,
	binary='clang-format', style=None):	binary='clang-format', style=None, staged=False):
	"""Run clang-format on the given file and save the result to a git blob.	"""Run clang-format on the given file and save the result to a git blob.

	Runs on the file in `revision` if not None, or on the file in the working	Runs on the file in `revision` if not None, or on the file in the working
	directory if `revision` is None.	directory if `revision` is None.

	Returns the object ID (SHA-1) of the created blob."""	Returns the object ID (SHA-1) of the created blob."""
		assert not staged or revision is None
	clang_format_cmd = [binary]	clang_format_cmd = [binary]
	if style:	if style:
	clang_format_cmd.extend(['-style='+style])	clang_format_cmd.extend(['-style='+style])
	clang_format_cmd.extend([	clang_format_cmd.extend([
	'-lines=%s:%s' % (start_line, start_line+line_count-1)	'-lines=%s:%s' % (start_line, start_line+line_count-1)
	for start_line, line_count in line_ranges])	for start_line, line_count in line_ranges])
	if revision:	if staged or revision:
	clang_format_cmd.extend(['-assume-filename='+filename])	clang_format_cmd.extend(['-assume-filename='+filename])
	git_show_cmd = ['git', 'cat-file', 'blob', '%s:%s' % (revision, filename)]	git_show_cmd = ['git', 'cat-file', 'blob', '%s:%s' % (revision or '',
		filename)]
	git_show = subprocess.Popen(git_show_cmd, stdin=subprocess.PIPE,	git_show = subprocess.Popen(git_show_cmd, stdin=subprocess.PIPE,
	stdout=subprocess.PIPE)	stdout=subprocess.PIPE)
	git_show.stdin.close()	git_show.stdin.close()
Context not available.

This is an archive of the discontinued LLVM Phabricator instance.

[git-clang-format]: New option to perform formatting against staged changes onlyNeeds ReviewPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 112801

git-clang-format

[git-clang-format]: New option to perform formatting against staged changes only
Needs ReviewPublic