This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Always open files using UTF-8 encoding
ClosedPublic

Authored by werat on Jul 26 2021, 7:37 AM.

Details

Summary

The encoding used for opening files depends on the OS and might be different
from UTF-8 (e.g. on Windows it can be CP-1252). The documentation files use
UTF-8 and might be incompatible with other encodings. For example, right now
clang-tools-extra/docs/clang-tidy/checks/abseil-no-internal-dependencies.rst
has non-ASCII quotes and running add_new_check.py fails on Windows, because
it tries to read the file with incompatible encoding.

Use io.open for compatibility with both Python 2 and Python 3.

Diff Detail

Event Timeline

werat created this revision.Jul 26 2021, 7:37 AM
werat requested review of this revision.Jul 26 2021, 7:37 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 26 2021, 7:37 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript

Hi! Thanks for the patch, it makes sense to me! Honestly, I think having the open shadow is maybe not the best way to solve this (please feel free to let me know if there are reasons it would be better), maybe spell it out explicitly (there aren't that many cases of open)?

For example, right now clang-tools-extra/docs/clang-tidy/checks/abseil-no-internal-dependencies.rst has non-ASCII quotes

Also, hm... Do we really need these? It's safe to remove those, this shouldn't be a problem, I think.

werat added a comment.Jul 29 2021, 1:47 AM

Hi! Thanks for the patch, it makes sense to me! Honestly, I think having the open shadow is maybe not the best way to solve this (please feel free to let me know if there are reasons it would be better), maybe spell it out explicitly (there aren't that many cases of open)?

It seemed a bit tedious to spell it out explicitly everywhere, but you're right, not a script of this size it's not a problem. The benefit of a global shadow is that nobody will use regular open accidentally, but I think the risk of that is relatively low.

Also, hm... Do we really need these? It's safe to remove those, this shouldn't be a problem, I think.

I don't think using non-ASCII quotes was intentional and we can definitely remove them. But future-proofing this script will avoid this problem in the future.

werat updated this revision to Diff 362677.Jul 29 2021, 1:48 AM

Use io.open explicitly.

werat updated this revision to Diff 362678.Jul 29 2021, 1:50 AM

Fix incorrect replace

Makes sense, thank you for the explanation. I've changed the problematic doc but this is probably still fine. Honestly, I don't really think we should have more Unicode symbols in the docs for generators hence this patch might not be needed but I'd be OK if you really still want to land it. The "problem" is that we'd be left with somewhat non-uniformity (there are many other Python scripts with open and inputs there might also have similar issues) but I guess that's what we have anyway.

clang-tools-extra/clang-tidy/add_new_check.py
19

The comments look misplaced now. Maybe add it next to the first io.open and end with "Here and elsewhere"?

werat added a comment.Jul 29 2021, 3:00 AM

Honestly, I don't really think we should have more Unicode symbols in the docs for generators

I'm not very familiar with the documentation build process, but maybe we could enforce only-ASCII there? Then all scripts and tools can just assume it will work with plan open.

kbobyrev requested changes to this revision.Jul 29 2021, 9:29 AM

It's certainly doable but looks like a complete overkill for this kind of problem :) The patch is fine, I'm okay with landing it if you're expecting similar problem to arise in the future, I'll accept with the comment next to the first usage, everything else is OK.

This revision now requires changes to proceed.Jul 29 2021, 9:29 AM
werat updated this revision to Diff 363413.Aug 2 2021, 2:14 AM

Moved the comment to the first usage of io.open.

kbobyrev accepted this revision.Aug 2 2021, 2:33 AM

Thanks!

This revision is now accepted and ready to land.Aug 2 2021, 2:33 AM
This revision was landed with ongoing or failed builds.Aug 2 2021, 2:37 AM
This revision was automatically updated to reflect the committed changes.