This is an archive of the discontinued LLVM Phabricator instance.

Use `-text` git attribute instead of `text eol=...'
ClosedPublic

Authored by MForster on Apr 28 2022, 3:58 AM.

Details

Summary

These automatic conversions lead to issues in various workflows, and all
we want here are files that retain their line endings under all
circumstances. binary captures that perfectly well and leads to fewer
issues.

Diff Detail

Event Timeline

MForster created this revision.Apr 28 2022, 3:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 3:58 AM
Herald added a subscriber: jdoerfert. · View Herald Transcript
MForster requested review of this revision.Apr 28 2022, 3:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 28 2022, 3:58 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
MForster updated this revision to Diff 425742.Apr 28 2022, 4:04 AM

Force-add crlf.cpp(.expected) with CRLF line endings.

I believe (based on some similar issues I was solving in lldb) that -text instead of binary is sufficient to prevent git from fidlling with the contents, while maintaining the ability to diff the files.

Other than that, I am in favour of this approach, but I'd recommend getting this reviewed by someone on the clang-tools team.

MForster updated this revision to Diff 425745.Apr 28 2022, 4:40 AM

Use -text instead of binary.

ilya-biryukov accepted this revision.Apr 28 2022, 5:24 AM
This revision is now accepted and ready to land.Apr 28 2022, 5:24 AM

NIT: change the commit message to say -text instead of binary

This revision was automatically updated to reflect the committed changes.

NIT: change the commit message to say -text instead of binary

Ah, sorry, saw this too late...

MForster retitled this revision from Use `binary` git attribute instead of `text eol=...' to Use `-text` git attribute instead of `text eol=...'.Apr 28 2022, 5:30 AM

NIT: change the commit message to say -text instead of binary

Ah, sorry, saw this too late...

Turns out I had already renamed the commit :-). I also updated the code review title...

I think this change broke again what D97625 was trying to fix. These are text files and we want them to keep the specified line endings regardless of the local git configuration. See also @smeenai's comment D124563#3478625.

clang-tools-extra/test/.gitattributes
7–15

My understanding is that -text removes an attribute, but what attribute is there to remove? There is no global .gitattributes that would set it in the first place. This change just reverts to the status quo before D97625, or rather before that file was moved.

labath added inline comments.Apr 28 2022, 7:14 AM
clang-tools-extra/test/.gitattributes
7–15

Are you sure about that? Git seems to distinguish between "unset" and "unspecified" values of an attribute:

Each attribute can be in one of these states for a given path:

Set
    The path has the attribute with special value "true"; this is specified by listing only the
    name of the attribute in the attribute list.

Unset
    The path has the attribute with special value "false"; this is specified by listing the
    name of the attribute prefixed with a dash - in the attribute list.

Set to a value
    The path has the attribute with specified string value; this is specified by listing the
    name of the attribute followed by an equal sign = and its value in the attribute list.

Unspecified
    No pattern matches the path, and nothing says if the path has or does not have the
    attribute, the attribute for the path is said to be Unspecified.

And then it says this about the text attribute in particular:

Set
    Setting the text attribute on a path enables end-of-line normalization and marks the
    path as a text file. End-of-line conversion takes place without guessing the content
    type.

Unset
    Unsetting the text attribute on a path tells Git not to attempt any end-of-line
    conversion upon checkin or checkout.

Set to string value "auto"
    When text is set to "auto", the path is marked for automatic end-of-line conversion. If
    Git decides that the content is text, its line endings are converted to LF on checkin.
    When the file has been committed with CRLF, no conversion is done.

Unspecified
    If the text attribute is unspecified, Git uses the core.autocrlf configuration variable
    to determine if the file should be converted.

I think this change broke again what D97625 was trying to fix. These are text files and we want them to keep the specified line endings regardless of the local git configuration. See also @smeenai's comment D124563#3478625.

This change should be doing exactly that.
The behaviour in the mentioned comment should apply when the text attribute is unspecified, not when it's explicitly unset.
From git docs:

Unset
Unsetting the text attribute on a path tells Git not to attempt any end-of-line conversion upon checkin or checkout.

Unspecified
If the text attribute is unspecified, Git uses the core.autocrlf configuration variable to determine if the file should be converted.
clang-tools-extra/test/.gitattributes
7–15

The git docs seem to be pretty clear that unsetting the attribute should do the right thing:

Unset
Unsetting the text attribute on a path tells Git not to attempt any end-of-line conversion upon checkin or checkout.

It seems git is supposed to do the right thing for us here, unless I misunderstand something.

@labath, ah, sorry, you beat me to it.

This comment was removed by MForster.
aaronpuchert added a comment.EditedApr 28 2022, 8:07 AM

Fair enough, but don't we want to enforce LF or CRLF, respectively? An editor could inadvertently change the line endings, and someone might not notice before committing. Explicitly specifying the right line endings gives us the proper line endings even if some editors mess things up.

Also I still don't understand what specifically this is fixing. What exactly was wrong about the previous configuration?

The commit message talks about "various workflows" but which workflows are that? Does that mean no one can use .gitattributes with something else than -text in LLVM?

Fair enough, but don't we want to enforce LF or CRLF, respectively?

Sure, but is the version control system the right tool to do that? I think it'd be better to have the test itself confirm the consistency of the data (or convert it into the right format at runtime) and not rely on magical conversions within the VC tool.

An editor could inadvertently change the line endings, and someone might not notice before committing.

From what I understood, most of these tests would immediately break (due to hardcoded offsets) if the line endings were changed.

Also I still don't understand what specifically this is fixing. What exactly was wrong about the previous configuration?

The branch switching issue (discussed in D124563) is one. We also ran into problems when importing this into our version control system (as it did not do the .gitattributes conversion) -- although one could say that is not an upstream problem...

An editor could inadvertently change the line endings, and someone might not notice before committing.

From what I understood, most of these tests would immediately break (due to hardcoded offsets) if the line endings were changed.

Ok, that's a valid point.

The branch switching issue (discussed in D124563) is one.

That is an issue, but I don't think this change fixes it. D124563 did normalize the file endings and switching between branches after that should work fine. Switching between branches before that will remain broken because the files had not been normalized. I don't think any change can fix that because they can't retroactively change the inconsistent snapshots.

We also ran into problems when importing this into our version control system (as it did not do the .gitattributes conversion) -- although one could say that is not an upstream problem...

Got it, so some implementations simply can't deal with this.

LGTM

This and the commit (@MForster ) was too hasty. There should have been time for people discussing D124563 and D97625 to react, and should have been added as reviewers. I didn't even see @ilya-biryukov participating in the other discussions.

LGTM

This and the commit (@MForster ) was too hasty. There should have been time for people discussing D124563 and D97625 to react, and should have been added as reviewers. I didn't even see @ilya-biryukov participating in the other discussions.

We committed this to unbreak our internal integrate at Google. Integrates are time-critical for us, so we normally choose to unbreak early and have post-commit discussions instead.
We should have probably communicated this better in commit messages and reviews.