Git wants to check in 'text' files with LF endings, so this changes them
in the repository but not in the checkout, where they keep CRLF endings.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The following files have their line endings (when checked out on disk) changed from CRLF to LF by this patch. Seems harmless, but I just wanted to confirm that it was expected:
clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include.h clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include2.h clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/system/sys/types.h clang-tools-extra/test/clang-tidy/checkers/readability-else-after-return-if-constexpr.cpp clang-tools-extra/test/modularize/Inputs/CompileError/module.modulemap clang-tools-extra/test/modularize/Inputs/MissingHeader/module.modulemap clang-tools-extra/test/pp-trace/Inputs/module.map
(The files which should be CRLF according to .gitattributes remained CRLF.)
Most of the files changed here are not affected by the .gitattributes file, such as
clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include.h clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/readability-duplicate-include2.h clang-tools-extra/test/clang-tidy/checkers/Inputs/readability-duplicate-include/system/sys/types.h clang-tools-extra/test/clang-tidy/checkers/readability-else-after-return-if-constexpr.cpp clang-tools-extra/test/modularize/Inputs/CompileError/module.modulemap clang-tools-extra/test/modularize/Inputs/MissingHeader/module.modulemap clang-tools-extra/test/pp-trace/Inputs/module.map
The only affected files seem to be
clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp.expected
Looking back at D97625, the main goal of that change was to ensure that tests relying on LF endings also work on Windows, and the two files that want CRLF did not in fact cause test failures.
So maybe we should simply remove
# These test input files rely on two-byte Windows (CRLF) line endings. clang-apply-replacements/Inputs/crlf/crlf.cpp text eol=crlf clang-apply-replacements/Inputs/crlf/crlf.cpp.expected text eol=crlf
from the .gitattributes file again. Then we can keep them stored with CRLF endings, and each file has a comment anyway that they rely on them. The files have been there for many years and never been a problem.
Could you test if this works for you?
We can still discuss normalizing the other files, but maybe we can get some reviewers involved in clang-tools-extra for that.
I *think* this would mean that if you're on Windows and have core.autocrlf set to input, when you commit changes to this files, Git will convert them back to LF line endings. Not 100% sure of that though.
If I check out this commit and then check out the previous commit, clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp and clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp.expected become modified in my working directory; their line endings are changed from CRLF to LF. That seems undesirable.
Good catch. Looking at git documentation (https://git-scm.com/docs/gitattributes#_text) by virtue of applying * text=auto the line endings will be stored internally as LF and then use the system settings on checkout. Looking on my linux box I see LF endings but checking this out on Windows I see CRLF endings so this behavior is correct. This diff is displaying index changes which does move from CRLF->LF. I didn't see any test failures on Linux and these files will not have changed on Windows so this should be good.
Good catch, I see it as well. The .gitattributes change needs to be atomic with the renormalization change. If I checkout from main after the .gitattributes change (git checkout ac5f7be6a868) I'll see the files as dirty. However, if I checkout the revision before (git checkout ac5f7be6a868~1) this goes away. I think the way to go is to revert ac5f7be6a868 then land everything as a single stack to prevent this issue.
Regarding crlf.cpp(.expected), adding the modified changes to a commit indeed solved the problem I described in https://reviews.llvm.org/rGac5f7be6a8688955a282becf00eebc542238a86b#1080897 and I was working with that so far. Still don't know why Linux git doesn't do the same?
Don't know about the other files, but according to https://github.com/git/git/commit/af6e0fe3a589d58bfd508c1c6ccbeb38586eb06b, this seems the right thing to do.
As far as I recall this was done because there are many ways how the line endings can be accidentally changed (core.autocrlf being one of then, others are clang-format or text editors that do not retain line endings) and committed. Removing the files from .gitignore undo the intended line ending pin from D97625.
Ok, that's a good point.
How about we remove * text=auto for now? Or would that break something?
Perhaps we can discuss * text=auto on the entire code base separately.
Linux checks the modified time stamp before looking at file contents. After touching the files I can also reproduce it.
The documentation says that:
If you want to ensure that text files that any contributor introduces to the repository have their line endings normalized, you can set the text attribute to "auto" for all files.
* text=auto
And for now we don't actually want that. So I'd say remove that line and normalize the two remaining files.
In the current top-of-tree, the gitattributes line endings and the actual line endings contradict each other, leading to such inconsistencies.
Doesn't change anything about existing commits though. There is no way to fix that now I'm afraid. We can only fix it for future commits.
Yeah unfortunately. On Linux since it has the timestamp check landing this in one piece will be clean but on Windows the existing commits will continue to have this issue.
Would be nice if someone could accept this. I think this comes closest to the intention of D97625.
Makes sense to me, thanks for putting it up. If you want to commandeer I can accept the change or you can accept your own revision?
Checking locally I'm seeing LF as the line ending in crlf.cpp in the working directory. Can you double check that everything matches up?
I had this too, but checking out again seems to have fixed it. Maybe you used arc patch like I did? I presume that applies just textually and doesn't know about .gitattributes.
And sorry for landing already, there seems to have been a race...
Ah it was some strange setup on my end. Confirmed patch and commit in main are good with CRLF. Apologies for the noise.
Ah that would explain it, I used arc patch and also saw the same thing. It was fixed after I switched off arcpatch-D124563 and back. Anyways it's all good on main which is what matters.
The patch does actually change the files to LF endings. So just applying the patch with non-Git tools will make LF endings, but Git will apply the LF -> CRLF transformation when it checks out itself. Git doesn't show the file as modified because after cleaning the file (i.e. applying CRLF -> LF) it's the same as in the index.
Sorry for all the noise, I was just annoyed about this empty test directory and thought we just need to move that file... well, it was a bit of an adventure. Thanks for helping out here.
To confirm in main:
~/llvm-project2# git ls-files --eol clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp i/lf w/crlf attr/text eol=crlf clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp
i/lf indicates in the index it's stored as LF but transformed to w/crlf CRLF in the working directory.
After running arc patch though:
~/llvm-project# llvm-arc patch D124563 ~/llvm-project# git ls-files --eol clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp i/lf w/lf attr/text eol=crlf clang-tools-extra/test/clang-apply-replacements/Inputs/crlf/crlf.cpp
So confirmed it was an arc patch diff application. EOL is... tricky.
Sorry for all the noise, I was just annoyed about this empty test directory and thought we just need to move that file... well, it was a bit of an adventure. Thanks for helping out here.
It happens, properly fixing the original diff to make it actually do something was definitely the right choice. If anything Git should warn loudly that the index needs to be refreshed if .gitattributes is modified or added.
Happy to help, I learned quite a lot about git internals digging into this :)
One way to look at this is that the two filters clean (for checking in) and smudge (for checking out) are neither left nor right inverses of each other.
- The composition clean ∘ smudge not being the identity caused our original problem of files showing up modified with no way to fix it: no checked-out file could reproduce the checked-in file.
- The composition smudge ∘ clean not being the identity caused the intermittent issue after arc patch: the file didn't show up as modified, even though it wasn't what Git would produce on checkout.
The documentation says that clean should be a projection:
For best results, clean should not alter its output further if it is run twice ("clean→clean" should be equivalent to "clean")
We should also have clean ∘ smudge ∘ clean = clean, and that's what git add --renormalize uses: by checking in cleaned files we're in the subset of files on which clean ∘ smudge is the identity. In fact in our case clean should even be the equalizer of smudge ∘ clean and the identity on checked-in files, that is be able to produce every file that survives the round-trip.
In case you needed an advertisement for category theory. ;)
This change seems to have broken switching to a branch before the commit completely. The only way I was able to recover my git checkout is to comment out the * text=auto line and then run git reset --hard, otherwise the crlf.cpp files always show up as changed.
It also seems weird to me that a file that relies on CRLF line endings is checked in explicitly with LF line endings. A better approach IMHO would be to check it in as binary.
We discussed this previously, there is no way to fix that now. We should have normalized in the same change that added the .gitattributes. Any checkout between those two changes will now show bogus modified files. Sometimes it happens to work out because Git apparently looks at file time stamps before checking the contents, so it doesn't see an issue with those files. But if you touch them they will again show up as modified.
These are all text files though, not binaries. Checking it in as binary will not show any diffs.
As a temporary workaround for switching between branches, just add a local file clang-tools-extra/test/clang-apply-replacements/.gitattributes with contents:
Inputs/crlf/crlf.cpp binary Inputs/crlf/crlf.cpp.expected binary
This will remove all kinds of filtering for the moment. When you remove that file, just make sure to force checkout both files again to make sure the filter applies.
These are all text files though, not binaries. Checking it in as binary will not show any diffs.
See D124606. -text seems to be the right compromise.
To quote an earlier comment from this thread:
Which convinced me that D97625 was right: we do want the line endings to remain CRLF, and not have them accidentally destroyed. With -text we say that we don't care, which is not true.
I verified manually on Windows that the behavior is still what we want. D124606 explains that -text means that core.autocrlf is ignored.