This is an archive of the discontinued LLVM Phabricator instance.

Drop '* text=auto' from .gitattributes and normalize
ClosedPublic

Authored by aaronpuchert on Apr 27 2022, 3:16 PM.

Details

Summary

Git wants to check in 'text' files with LF endings, so this changes them
in the repository but not in the checkout, where they keep CRLF endings.

Diff Detail

Event Timeline

modimo created this revision.Apr 27 2022, 3:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 3:16 PM
Herald added subscribers: hoy, wenlei. · View Herald Transcript
modimo requested review of this revision.Apr 27 2022, 3:16 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 3:16 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
modimo edited the summary of this revision. (Show Details)Apr 27 2022, 3:21 PM
modimo retitled this revision from renormalize to Renormalize line endings after ac5f7be6a8688955a282becf00eebc542238a86b.Apr 27 2022, 3:24 PM
modimo edited the summary of this revision. (Show Details)
modimo added a reviewer: aaronpuchert.

The following files have their line endings (when checked out on disk) changed from CRLF to LF by this patch. Seems harmless, but I just wanted to confirm that it was expected:

clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include.h
clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include2.h
clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/system/sys/types.h
clang-tools-extra/test/clang-tidy/checkers/readability-else-after-return-if-constexpr.cpp
clang-tools-extra/test/modularize/Inputs/CompileError/module.modulemap
clang-tools-extra/test/modularize/Inputs/MissingHeader/module.modulemap
clang-tools-extra/test/pp-trace/Inputs/module.map

(The files which should be CRLF according to .gitattributes remained CRLF.)

Most of the files changed here are not affected by the .gitattributes file, such as

clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include.h
clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include2.h
clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/system/sys/types.h
clang-tools-extra/test/clang-tidy/checkers/readability-else-after-return-if-constexpr.cpp
clang-tools-extra/test/modularize/Inputs/CompileError/module.modulemap
clang-tools-extra/test/modularize/Inputs/MissingHeader/module.modulemap
clang-tools-extra/test/pp-trace/Inputs/module.map

The only affected files seem to be

clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp
clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp.expected

Looking back at D97625, the main goal of that change was to ensure that tests relying on LF endings also work on Windows, and the two files that want CRLF did not in fact cause test failures.

So maybe we should simply remove

# These test input files rely on two-byte Windows (CRLF) line endings.
clang-apply-replacements/Inputs/crlf/crlf.cpp text eol=crlf
clang-apply-replacements/Inputs/crlf/crlf.cpp.expected text eol=crlf

from the .gitattributes file again. Then we can keep them stored with CRLF endings, and each file has a comment anyway that they rely on them. The files have been there for many years and never been a problem.

Could you test if this works for you?

We can still discuss normalizing the other files, but maybe we can get some reviewers involved in clang-tools-extra for that.

So maybe we should simply remove

# These test input files rely on two-byte Windows (CRLF) line endings.
clang-apply-replacements/Inputs/crlf/crlf.cpp text eol=crlf
clang-apply-replacements/Inputs/crlf/crlf.cpp.expected text eol=crlf

from the .gitattributes file again. Then we can keep them stored with CRLF endings, and each file has a comment anyway that they rely on them. The files have been there for many years and never been a problem.

I *think* this would mean that if you're on Windows and have core.autocrlf set to input, when you commit changes to this files, Git will convert them back to LF line endings. Not 100% sure of that though.

If I check out this commit and then check out the previous commit, clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp and clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp.expected become modified in my working directory; their line endings are changed from CRLF to LF. That seems undesirable.

The following files have their line endings (when checked out on disk) changed from CRLF to LF by this patch. Seems harmless, but I just wanted to confirm that it was expected:

clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include.h
clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include2.h
clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/system/sys/types.h
clang-tools-extra/test/clang-tidy/checkers/readability-else-after-return-if-constexpr.cpp
clang-tools-extra/test/modularize/Inputs/CompileError/module.modulemap
clang-tools-extra/test/modularize/Inputs/MissingHeader/module.modulemap
clang-tools-extra/test/pp-trace/Inputs/module.map

(The files which should be CRLF according to .gitattributes remained CRLF.)

Good catch. Looking at git documentation (https://git-scm.com/docs/gitattributes#_text) by virtue of applying * text=auto the line endings will be stored internally as LF and then use the system settings on checkout. Looking on my linux box I see LF endings but checking this out on Windows I see CRLF endings so this behavior is correct. This diff is displaying index changes which does move from CRLF->LF. I didn't see any test failures on Linux and these files will not have changed on Windows so this should be good.

If I check out this commit and then check out the previous commit, clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp and clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp.expected become modified in my working directory; their line endings are changed from CRLF to LF. That seems undesirable.

Good catch, I see it as well. The .gitattributes change needs to be atomic with the renormalization change. If I checkout from main after the .gitattributes change (git checkout ac5f7be6a868) I'll see the files as dirty. However, if I checkout the revision before (git checkout ac5f7be6a868~1) this goes away. I think the way to go is to revert ac5f7be6a868 then land everything as a single stack to prevent this issue.

Regarding crlf.cpp(.expected), adding the modified changes to a commit indeed solved the problem I described in https://reviews.llvm.org/rGac5f7be6a8688955a282becf00eebc542238a86b#1080897 and I was working with that so far. Still don't know why Linux git doesn't do the same?

Don't know about the other files, but according to https://github.com/git/git/commit/af6e0fe3a589d58bfd508c1c6ccbeb38586eb06b, this seems the right thing to do.

I *think* this would mean that if you're on Windows and have core.autocrlf set to input, when you commit changes to this files, Git will convert them back to LF line endings. Not 100% sure of that though.

As far as I recall this was done because there are many ways how the line endings can be accidentally changed (core.autocrlf being one of then, others are clang-format or text editors that do not retain line endings) and committed. Removing the files from .gitignore undo the intended line ending pin from D97625.

I *think* this would mean that if you're on Windows and have core.autocrlf set to input, when you commit changes to this files, Git will convert them back to LF line endings. Not 100% sure of that though.

Ok, that's a good point.

Good catch. Looking at git documentation (https://git-scm.com/docs/gitattributes#_text) by virtue of applying * text=auto the line endings will be stored internally as LF and then use the system settings on checkout.

How about we remove * text=auto for now? Or would that break something?

Perhaps we can discuss * text=auto on the entire code base separately.

Still don't know why Linux git doesn't do the same?

Linux checks the modified time stamp before looking at file contents. After touching the files I can also reproduce it.

aaronpuchert added a comment.EditedApr 27 2022, 5:28 PM

How about we remove * text=auto for now? Or would that break something?

The documentation says that:

If you want to ensure that text files that any contributor introduces to the repository have their line endings normalized, you can set the text attribute to "auto" for all files.

*	text=auto

And for now we don't actually want that. So I'd say remove that line and normalize the two remaining files.

If I check out this commit and then check out the previous commit, clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp and clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp.expected become modified in my working directory; their line endings are changed from CRLF to LF. That seems undesirable.

In the current top-of-tree, the gitattributes line endings and the actual line endings contradict each other, leading to such inconsistencies.

I think the way to go is to revert ac5f7be6a868 then land everything as a single stack to prevent this issue.

Doesn't change anything about existing commits though. There is no way to fix that now I'm afraid. We can only fix it for future commits.

aaronpuchert edited the summary of this revision. (Show Details)
aaronpuchert removed a subscriber: drodriguez.

Drop * text=auto, so that we renormalize only the files that need it.

aaronpuchert retitled this revision from Renormalize line endings after ac5f7be6a8688955a282becf00eebc542238a86b to Drop '* text=auto' from .gitattributes and normalize.Apr 27 2022, 5:48 PM
aaronpuchert edited the summary of this revision. (Show Details)

I think the way to go is to revert ac5f7be6a868 then land everything as a single stack to prevent this issue.

Doesn't change anything about existing commits though. There is no way to fix that now I'm afraid. We can only fix it for future commits.

Yeah unfortunately. On Linux since it has the timestamp check landing this in one piece will be clean but on Windows the existing commits will continue to have this issue.

Would be nice if someone could accept this. I think this comes closest to the intention of D97625.

Drop * text=auto, so that we renormalize only the files that need it.

Makes sense to me, thanks for putting it up. If you want to commandeer I can accept the change or you can accept your own revision?

aaronpuchert commandeered this revision.Apr 27 2022, 5:58 PM
aaronpuchert edited reviewers, added: modimo; removed: aaronpuchert.

Ok, let's do it this way.

modimo accepted this revision.Apr 27 2022, 5:59 PM
This revision is now accepted and ready to land.Apr 27 2022, 5:59 PM
modimo requested changes to this revision.Apr 27 2022, 6:04 PM

Checking locally I'm seeing LF as the line ending in crlf.cpp in the working directory. Can you double check that everything matches up?

This revision now requires changes to proceed.Apr 27 2022, 6:04 PM
This revision was not accepted when it landed; it landed in state Needs Revision.Apr 27 2022, 6:07 PM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

Checking locally I'm seeing LF as the line ending in crlf.cpp in the working directory. Can you double check that everything matches up?

I had this too, but checking out again seems to have fixed it. Maybe you used arc patch like I did? I presume that applies just textually and doesn't know about .gitattributes.

And sorry for landing already, there seems to have been a race...

Checking locally I'm seeing LF as the line ending in crlf.cpp in the working directory. Can you double check that everything matches up?

Ah it was some strange setup on my end. Confirmed patch and commit in main are good with CRLF. Apologies for the noise.

Checking locally I'm seeing LF as the line ending in crlf.cpp in the working directory. Can you double check that everything matches up?

I had this too, but checking out again seems to have fixed it. Maybe you used arc patch like I did? I presume that applies just textually and doesn't know about .gitattributes.

And sorry for landing already, there seems to have been a race...

Ah that would explain it, I used arc patch and also saw the same thing. It was fixed after I switched off arcpatch-D124563 and back. Anyways it's all good on main which is what matters.

I used arc patch and also saw the same thing.

The patch does actually change the files to LF endings. So just applying the patch with non-Git tools will make LF endings, but Git will apply the LF -> CRLF transformation when it checks out itself. Git doesn't show the file as modified because after cleaning the file (i.e. applying CRLF -> LF) it's the same as in the index.

Sorry for all the noise, I was just annoyed about this empty test directory and thought we just need to move that file... well, it was a bit of an adventure. Thanks for helping out here.

I used arc patch and also saw the same thing.

The patch does actually change the files to LF endings. So just applying the patch with non-Git tools will make LF endings, but Git will apply the LF -> CRLF transformation when it checks out itself. Git doesn't show the file as modified because after cleaning the file (i.e. applying CRLF -> LF) it's the same as in the index.

To confirm in main:

~/llvm-project2# git ls-files --eol clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp
i/lf    w/crlf  attr/text eol=crlf      clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp

i/lf indicates in the index it's stored as LF but transformed to w/crlf CRLF in the working directory.

After running arc patch though:

~/llvm-project# llvm-arc patch D124563
~/llvm-project# git ls-files --eol clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp
i/lf    w/lf    attr/text eol=crlf      clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp

So confirmed it was an arc patch diff application. EOL is... tricky.

Sorry for all the noise, I was just annoyed about this empty test directory and thought we just need to move that file... well, it was a bit of an adventure. Thanks for helping out here.

It happens, properly fixing the original diff to make it actually do something was definitely the right choice. If anything Git should warn loudly that the index needs to be refreshed if .gitattributes is modified or added.

Happy to help, I learned quite a lot about git internals digging into this :)

modimo reopened this revision.Apr 27 2022, 6:43 PM
modimo accepted this revision.
This revision is now accepted and ready to land.Apr 27 2022, 6:43 PM
modimo closed this revision.Apr 27 2022, 6:43 PM

Git will apply the LF -> CRLF transformation when it checks out itself. Git doesn't show the file as modified because after cleaning the file (i.e. applying CRLF -> LF) it's the same as in the index.

One way to look at this is that the two filters clean (for checking in) and smudge (for checking out) are neither left nor right inverses of each other.

  • The composition cleansmudge not being the identity caused our original problem of files showing up modified with no way to fix it: no checked-out file could reproduce the checked-in file.
  • The composition smudgeclean not being the identity caused the intermittent issue after arc patch: the file didn't show up as modified, even though it wasn't what Git would produce on checkout.

The documentation says that clean should be a projection:

For best results, clean should not alter its output further if it is run twice ("clean→clean" should be equivalent to "clean")

We should also have cleansmudgeclean = clean, and that's what git add --renormalize uses: by checking in cleaned files we're in the subset of files on which cleansmudge is the identity. In fact in our case clean should even be the equalizer of smudgeclean and the identity on checked-in files, that is be able to produce every file that survives the round-trip.

In case you needed an advertisement for category theory. ;)

nikic added a subscriber: nikic.Apr 28 2022, 1:28 AM

This change seems to have broken switching to a branch before the commit completely. The only way I was able to recover my git checkout is to comment out the * text=auto line and then run git reset --hard, otherwise the crlf.cpp files always show up as changed.

It also seems weird to me that a file that relies on CRLF line endings is checked in explicitly with LF line endings. A better approach IMHO would be to check it in as binary.

I'm proposing this: D124606.

This change seems to have broken switching to a branch before the commit completely. The only way I was able to recover my git checkout is to comment out the * text=auto line and then run git reset --hard, otherwise the crlf.cpp files always show up as changed.

We discussed this previously, there is no way to fix that now. We should have normalized in the same change that added the .gitattributes. Any checkout between those two changes will now show bogus modified files. Sometimes it happens to work out because Git apparently looks at file time stamps before checking the contents, so it doesn't see an issue with those files. But if you touch them they will again show up as modified.

It also seems weird to me that a file that relies on CRLF line endings is checked in explicitly with LF line endings. A better approach IMHO would be to check it in as binary.

These are all text files though, not binaries. Checking it in as binary will not show any diffs.

As a temporary workaround for switching between branches, just add a local file clang-tools-extra/test/clang-apply-replacements/.gitattributes with contents:

Inputs/crlf/crlf.cpp binary
Inputs/crlf/crlf.cpp.expected binary

This will remove all kinds of filtering for the moment. When you remove that file, just make sure to force checkout both files again to make sure the filter applies.

It also seems weird to me that a file that relies on CRLF line endings is checked in explicitly with LF line endings. A better approach IMHO would be to check it in as binary.

These are all text files though, not binaries. Checking it in as binary will not show any diffs.

See D124606. -text seems to be the right compromise.

See D124606. -text seems to be the right compromise.

To quote an earlier comment from this thread:

I *think* this would mean that if you're on Windows and have core.autocrlf set to input, when you commit changes to this files, Git will convert them back to LF line endings. Not 100% sure of that though.

Which convinced me that D97625 was right: we do want the line endings to remain CRLF, and not have them accidentally destroyed. With -text we say that we don't care, which is not true.

I *think* this would mean that if you're on Windows and have core.autocrlf set to input, when you commit changes to this files, Git will convert them back to LF line endings. Not 100% sure of that though.

Which convinced me that D97625 was right: we do want the line endings to remain CRLF, and not have them accidentally destroyed. With -text we say that we don't care, which is not true.

I verified manually on Windows that the behavior is still what we want. D124606 explains that -text means that core.autocrlf is ignored.