This is an archive of the discontinued LLVM Phabricator instance.

[clang-tidy] Fix llvm-header-guard for Windows paths containing drive letter (e.g. C:).
Needs ReviewPublic

Authored by curdeius on Dec 14 2021, 1:53 AM.

Details

Summary

Fixes http://llvm.org/PR52682.

The previous fix (https://reviews.llvm.org/rGb4f6f1c) escaped backslashes, but didn't cover colons.

So the check, for a file called e.g. "C:\test\test.h" would suggest the guard C:_TEST_TEST_H being an invalid name due to the presence of the colon.

Diff Detail

Event Timeline

curdeius created this revision.Dec 14 2021, 1:53 AM
curdeius requested review of this revision.Dec 14 2021, 1:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2021, 1:53 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
sberg added a subscriber: sberg.Dec 14 2021, 1:59 AM

So the check, for a file called e.g. "C:\test\test.h" would suggest the guard C:_TEST_TEST_H being an invalid name due to the presence of the colon.

...though the new suggestion C__TEST_TEST_H isn't ideal either, in that it uses an identifier that is reserved for the implementation (due to the double underscore)

salman-javed-nz added a comment.EditedDec 14 2021, 2:15 AM

The problem at the heart of all this is that llvm-header-guard isn't written flexible enough to support non-LLVM project structures.

See https://github.com/llvm/llvm-project/blob/main/clang-tools-extra/clang-tidy/llvm/HeaderGuardCheck.cpp#L44

For a path like C:\llvm-project\path\to\file, the llvm-project portion is replaced with llvm to give C:\llvm\path\to\file, then a substr() call strips everything up to and including llvm, resulting in path\to\file.
The path separators are replaced with underscores, resulting in PATH_TO_FILE.

The whole check falls apart if it can't find strings like "llvm-project" in the path.

Edit: I'm assume you're seeing this problem outside of the LLVM project?

clang-tools-extra/clang-tidy/llvm/HeaderGuardCheck.cpp
57

Are there other characters we should be sanitising here?
(Lest keep revisiting this code to add more characters to the list)

clang-tools-extra/clang-tidy/llvm/HeaderGuardCheck.cpp
57

Typo:
*Lest we keep revisiting

So the check, for a file called e.g. "C:\test\test.h" would suggest the guard C:_TEST_TEST_H being an invalid name due to the presence of the colon.

...though the new suggestion C__TEST_TEST_H isn't ideal either, in that it uses an identifier that is reserved for the implementation (due to the double underscore)

Indeed, I hadn't thought about it. We *might* change all __+ into a single _, but that seems an overkill to me (or at least something for another patch).
Also, the problem is not limited to :. A file could be named a--b.h and it would get a guard A__B_H with a double underscore.

The problem at the heart of all this is that llvm-header-guard isn't written flexible enough to support non-LLVM project structures.

I know, but this check is still useful outside of LLVM.
From my point of view, there's not much missing flexibility to use it outside LLVM (for basic stuff), only such small edge cases like this one.
Of course, one would wish to be able to set "root" names other than "llvm-project" and so on, but that's out of scope here.

Edit: I'm assume you're seeing this problem outside of the LLVM project?

Yes, in the bug report, it is used outside of LLVM.

clang-tools-extra/clang-tidy/llvm/HeaderGuardCheck.cpp
57

Well, difficult to say. On Windows, there are pretty many allowed characters in filenames, but I don't think we should care about it.
I think we should just care about what is sometimes called "POSIX Fully portable filenames" (which contains: A-Z a-z 0-9 . _ -).
Colon : is special (similarly to slashes / and \) as it may appear in the path.

So the check, for a file called e.g. "C:\test\test.h" would suggest the guard C:_TEST_TEST_H being an invalid name due to the presence of the colon.

...though the new suggestion C__TEST_TEST_H isn't ideal either, in that it uses an identifier that is reserved for the implementation (due to the double underscore)

Indeed, I hadn't thought about it. We *might* change all __+ into a single _, but that seems an overkill to me (or at least something for another patch).

Fixit suggestions should not produce incorrect code and double underscores will introduce undefined behavior in C++ code.

The problem at the heart of all this is that llvm-header-guard isn't written flexible enough to support non-LLVM project structures.

I know, but this check is still useful outside of LLVM.

It could be made to be useful outside of LLVM, but as it stands today, the check is only intended to be useful for LLVM. If we want to make it useful outside of LLVM, that's fine, but there's a bit more to do (for example, the check should be exposed outside of the llvm module) and that includes generalizing the check.

clang-tools-extra/clang-tidy/llvm/HeaderGuardCheck.cpp
57

Huh, I thought we had covered colons as part of the previous review -- sorry for missing that! FWIW, there are other characters that may appear as part of a path. / and \ are both path separators, but ? can appear in file namespace paths (which means . can then appear as part of the file name rather than a separator).

I think we should just care about what is sometimes called "POSIX Fully portable filenames" (which contains: A-Z a-z 0-9 . _ -)

I'd be pretty opposed to anything that takes non-ASCII characters and converts them to _ as that will wind up with header guards like ________ for some users.

salman-javed-nz added a comment.EditedDec 14 2021, 9:50 AM

It could be made to be useful outside of LLVM, but as it stands today, the check is only intended to be useful for LLVM. If we want to make it useful outside of LLVM, that's fine, but there's a bit more to do (for example, the check should be exposed outside of the llvm module) and that includes generalizing the check.

There is https://reviews.llvm.org/D61508 which tries to generalise the llvm-header-guard check. Might be good to build on top of this in the future.

Hi @curdeius, have you had the chance to look at this since the last round of comments? The only thing I think this module still needs for now (until a new generic header guard check is developed), is a check that there aren't consecutive underscores in the output.

If you're busy, I don't mind finishing off this patch. I'm looking for things to do. As the author of the previous commit, I should have caught this case back then.