This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Fix pr48613: "llvm-header-guard uses a reserved identifier"
ClosedPublic

Authored by salman-javed-nz on Nov 18 2021, 2:39 AM.

Details

Summary

Fixes https://bugs.llvm.org/show_bug.cgi?id=48613.

llvm-header-guard is suggesting header guards with leading underscores
if the header file path begins with a '/' or similar special character.
Only reserved identifiers should begin with an underscore.

Diff Detail

Event Timeline

salman-javed-nz requested review of this revision.Nov 18 2021, 2:39 AM
aaron.ballman requested changes to this revision.Nov 18 2021, 4:42 AM
aaron.ballman added inline comments.
clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
28

This fixes some issues but I think it introduces new issues too. The old code would accept Unicode characters and pass them along as (valid) identifier characters. The new code will replace all the Unicode characters with _ which means some people's header guards may go from useful to _______________. We should definitely add test cases for Unicode file names.

Ultimately, I think this functionality will want to use UnicodeCharSets.h to implement something along the lines of what's done in isAllowedInitiallyIDChar() and isAllowedIDChar() from Lexer.cpp.

This revision now requires changes to proceed.Nov 18 2021, 4:42 AM
whisperity added inline comments.Nov 18 2021, 4:53 AM
clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
42–46

Is this true? At least in C++ (and perhaps in C) I believe _foo is a non-reserved identifier, only __foo or _Foo would be reserved.

clang-tools-extra/clang-tidy/utils/HeaderGuard.h
41

valid non-reserved

aaron.ballman added inline comments.Nov 18 2021, 5:24 AM
clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
42–46

_foo is reserved if its an external identifier in both C and C++: https://godbolt.org/z/GnG4v33vK

However, the issue here is that header guards are capitalized, so a file name like _foo still should drop the leading underscore so it doesn't get turned into _FOO which is always reserved.

clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
28

isAllowed*IDChar()does exactly what I want, but is a static function in Lexer.cpp.
Should I be changing its declaration so that it's exposed via Lexer.h?
I don't know what the bar is to warrant changing core public Clang API.

If I can just call the function directly from llvm-header-guard, I can avoid code duplication.

aaron.ballman added inline comments.Nov 18 2021, 11:10 AM
clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
28

We could expose the interface from Lexer if we feel that's the right approach. I was going back and forth on whether it was the right approach and eventually thought that we don't want to expose it directly because I don't think we want clang-tidy to care about things like LangOpts.AsmPreprocessor or even LangOpts.DollarIdents.

So my recommendation is to put this into LexerUtils.h/.cpp in clang-tidy rather than changing the Clang Lexer interface. WDYT?

salman-javed-nz marked an inline comment as not done.Nov 18 2021, 3:30 PM
salman-javed-nz added inline comments.
clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
28

(For brevity, when I say isAllowedIDChar, I mean both isAllowedIDChar and isAllowedInitiallyIDChar)

I see two possible directions:

If I expose the interface from Lexer.h, then in Clang-Tidy, it's only a matter of calling
isAllowedIDChar(C, Context->getLangOpts()). Clang-Tidy doesn't need to care about the contents of LangOpts - it just needs to pass them on as-is to the isAllowedIDChar() function.
getLangOpts() is already used all over the Clang-Tidy code base.

The other way is to create a copy of isAllowedIDChar() in the Clang-Tidy project and take out the parts to do with LangOpts.AsmPreprocessor and LangOpts.DollarIdents. Now we have an implementation tailored to just the things Clang-Tidy cares about. I will need to add clangLex to CMakeLists.txt to gain visibility of clang/lib/Lex/UnicodeCharSets.h's character tables.

salman-javed-nz marked an inline comment as not done.Nov 18 2021, 4:18 PM
salman-javed-nz added inline comments.
clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
28

I need to make a correction to my previous comment (I swear Phabricator had an "edit comment" button but I just don't see it at the moment).

The other way is to create a copy of isAllowedIDChar() in the Clang-Tidy project and take out the parts to do with LangOpts.AsmPreprocessor and LangOpts.DollarIdents. Now we have an implementation tailored to just the things Clang-Tidy cares about. I will need to add clangLex to CMakeLists.txt to gain visibility of clang/lib/Lex/UnicodeCharSets.h's character tables.

Of course clangTidyUtils already depends on clangLex, so we don't need another add_clang_library line in CMakeLists.txt, contrary to what I said earlier. However, UnicodeCharSets.h is still doesn't come up in Clang-Tidy's header search path because it doesn't sit in the include/clang folder. The file will need moving.

salman-javed-nz planned changes to this revision.Nov 19 2021, 3:02 AM
salman-javed-nz updated this revision to Diff 388699.EditedNov 20 2021, 5:04 AM
salman-javed-nz edited the summary of this revision. (Show Details)

Back out the changes to the "replace invalid characters in an identifier with underscores" feature.
Making this feature work with Unicode characters on different operating systems significantly increases the amount of code change required.
This can always be revisited in another patch.

In this patch, let's focus on fixing the most pressing problem at hand first: header guards starting with underscores.

salman-javed-nz marked 4 inline comments as done.EditedNov 20 2021, 5:34 AM

I reverted my changes to do with the invalid character substitution. Doing something akin to isAllowedInitiallyIDChar() and isAllowedIDChar() in Lexer.cpp will require converting from char* to UTF32*. Windows complicates things by using its own code page (typically Windows-1252). Will require a lot of careful consideration to implement correctly.

I have reverted this part of the patch for now, as it is not mentioned by the PR.

aaron.ballman accepted this revision.Nov 29 2021, 12:24 PM

I reverted my changes to do with the invalid character substitution. Doing something akin to isAllowedInitiallyIDChar() and isAllowedIDChar() in Lexer.cpp will require converting from char* to UTF32*. Windows complicates things by using its own code page (typically Windows-1252). Will require a lot of careful consideration to implement correctly.

I have reverted this part of the patch for now, as it is not mentioned by the PR.

I think reverting the changes was a good idea -- we can handle that separately as it is more complex and this solves a real problem. However, I'm not certain that Windows complicates anything here as Clang accepts UTF-8 input. We have -finput-charset but that only supports UTF-8 currently.

LGTM, modulo a commenting nit.

clang-tools-extra/clang-tidy/utils/HeaderGuard.cpp
169

This wasn't actually a typo -- I think it meant "if and only if" (https://en.wikipedia.org/wiki/If_and_only_if). Feel free to expand the comment instead of using iff.

This revision is now accepted and ready to land.Nov 29 2021, 12:24 PM

Update "iff" comment based on review.