This patch implements suggestion for typoed directives in preprocessor conditionals. The related issue is: https://github.com/llvm/llvm-project/issues/51598.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
clang/test/OpenMP/predefined_macro.c | ||
---|---|---|
7 ↗ | (On Diff #426262) | I am not sure if this typo was intended. When I renamed elsif to elif, #error "_OPENMP has incorrect value" on line 13 was evaluated. Therefore, if this typo was intended, just suppressing the warning with expected-warning would be better. However, if this typo was NOT intended, I think I should make _OPENMP equal to 201107. It is also possible that this test was mistakenly written. |
Thank you for working on this!
clang/include/clang/Basic/DiagnosticGroups.td | ||
---|---|---|
1023–1024 ↗ | (On Diff #426262) | We don't typically use "typo" in the user-facing part of diagnostics and this group doesn't seem likely to be reused, so I would remove it (another comment on this elsewhere). |
clang/include/clang/Basic/DiagnosticLexKinds.td | ||
362–363 | Rather than adding a warning over the top of an existing error, I think we should modify err_pp_invalid_directive to have an optional "did you mean?" when we find a plausible typo to correct. However, we do not issue that diagnostic when it's inside of a skipped conditional block, and that's what the crux of https://github.com/llvm/llvm-project/issues/51598 is about. As @rsmith pointed out in that issue (and I agree), compilers are free to support custom directives and those will validly appear inside of a conditional block that is skipped. We need to be careful not to diagnose those kinds of situations as an error. However, we still want to diagnose when the unknown directive is "sufficiently close" to another one which can control the conditional chain. e.g., #fi FOO // error unknown directive, did you mean #if? #endfi // error unknown directive, did you mean #endif? #if FOO #esle // diag: unknown directive, did you mean #else? #elfi // diag: unknown directive, did you mean #elif? #elfidef // diag: unknown directive, did you mean #elifdef #elinfdef // diag: unknown directive, did you mean #elifndef #frobble // No diagnostic, not close enough to a conditional directive to warrant diagnosing #eerp // No diagnostic, not close enough to a conditional directive to warrant diagnosing #endif Today, if FOO is defined to a nonzero value, you'll get diagnostics for all of those, but if FOO is not defined or is defined to 0, then there's no diagnostics. I think we want to consider directives that are *very likely* to be a typo (edit distance of 1, maybe 2?) for a conditional directive as a special case. Currently, we only diagnose unknown directives as an error. I think for these special cased conditional directive diagnostics, we'll want to use a warning rather than an error in this circumstance (just in case it turns out to be a valid directive in a skipped conditional block). If we do go that route and make it a warning, I think the warning group should be -Wunknown-directives to mirror -Wunknown-pragmas, -Wunknown-attributes, etc and it should be defined to have the same text as the error case. e.g., def err_pp_invalid_directive : Error< "invalid preprocessing directive%select{|; did you mean '#%1'?}0" >; def warn_pp_invalid_directive : Warning< err_pp_invalid_directive.Text>, InGroup<DiagGroup<"unknown-directives">>; WDYT? (These were my thoughts before seeing the rest of the patch. After reading the patch, it looks like we have pretty similar ideas here, which is great, but leaving the comment anyway in case you have further opinions.) | |
clang/include/clang/Tooling/LevDistance.h | ||
1 ↗ | (On Diff #426262) | You shouldn't need this file or the implementation file -- we have this functionality already in: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/edit_distance.h and https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/StringRef.h#L240. |
clang/lib/Lex/PPDirectives.cpp | ||
484–492 | Mostly just cleaning up for coding conventions, but also, no need to use a std::array and we typically don't use local top-level const qualification. | |
clang/test/OpenMP/predefined_macro.c | ||
7 ↗ | (On Diff #426262) | I tracked down the root cause here and fixed the bug in 50b51b1860acbfb775d5e2eee3310e25c635d667, so when you rebase on main you won't have to deal with this any longer. Good catch! |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
2 | I'd drop the -pedantic from here, then we don't need the expected warning on the last line. I think you should drop the -std=c99 as well so there's a RUN line that's not tied to a specific release, then I'd add a RUN line explicitly for c2x. | |
11 | It's interesting that this one suggested #elifdef instead of #elifndef -- is there anything that can be done for that? Also, one somewhat interesting question is whether we want to recommend #elifdef and #elifndef outside of C2x/C++2b mode. Those directives only exist in the latest language standard, but Clang supports them as a conforming extension in all language modes. Given that this diagnostic is about typos, I think I'm okay suggesting the directives even in older language modes. That's as likely to be a correct suggestion as not, IMO. |
Thank you so much for your clear review!
clang/include/clang/Basic/DiagnosticGroups.td | ||
---|---|---|
1023–1024 ↗ | (On Diff #426262) | Ah, I see. Thank you! |
clang/include/clang/Basic/DiagnosticLexKinds.td | ||
362–363 |
For now, I totally agree with deriving a new warning from err_pp_invalid_directive. However, for future scalability, I think it would be great if we could split those diagnostics into Error & Warning and Help, for example. Rustc does split the diagnostics like the following, and I think this is quite clear. So, a bit apart from this patch, I speculate creating a diagnostic system that can split them would bring Clang diagnostics much more readable. | |
clang/include/clang/Tooling/LevDistance.h | ||
1 ↗ | (On Diff #426262) | I totally missed the implementations! Sorry. But where should I put both findSimilarStr & findSimilarStr? |
clang/lib/Lex/PPDirectives.cpp | ||
484–492 | Thank you! Just wondering, but is there any reason not to use the const qualifier? | |
clang/test/OpenMP/predefined_macro.c | ||
7 ↗ | (On Diff #426262) | Thank you for fixing the bug! I could confirm the bug was fixed upstream. |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
11 |
I found I have to use std::min_element instead of std::max_element because we are finding the nearest (most minimum distance) string. Meanwhile, #elfindef has 2 distance with both #elifdef and #elifndef. After replacing std::max_element with std::min_element, I could suggest #elifndef from #elfinndef.
I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like this is a C2x/C++2b feature but compilable on Clang?) According to the algorithm of std::min_element, we only get an iterator of the first element even if there is another element that has the same distance. So, #elfindef only suggests #elifdef in accordance with the order of Candidates, and I don't think it is beautiful to depend on the order of candidates. I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable: #elfindef // diag: unknown directive, did you mean #elifdef or #elifndef? |
clang/include/clang/Tooling/LevDistance.h | ||
---|---|---|
1 ↗ | (On Diff #426262) | The computation results are not matched with my implementation and another language implementation. So, several directives that could be suggested in my implementation are no longer possible to be suggested, such as #id -> #if, #elid -> #elif, and #elsi -> #else, when using llvm::ComputeEditDistance. The implementation of llvm::ComputeEditDistance might be also correct, but some distances seem to be calculated longer, which causes fewer suggestions. Should I keep going with this implementation or not? size_t Dist = Str1.edit_distance(Str2, false); Str1: id, Str2: if, Dist: 2 # <-- here Str1: id, Str2: ifdef, Dist: 3 Str1: id, Str2: ifndef, Dist: 4 Str1: ifd, Str2: if, Dist: 1 Str1: ifd, Str2: ifdef, Dist: 2 Str1: ifd, Str2: ifndef, Dist: 3 Str1: id, Str2: if, Dist: 1 # <-- here Str1: id, Str2: ifdef, Dist: 3 Str1: id, Str2: ifndef, Dist: 4 Str1: ifd, Str2: if, Dist: 1 Str1: ifd, Str2: ifdef, Dist: 2 Str1: ifd, Str2: ifndef, Dist: 3 Str1: id, Str2: if, Dist: 1 # <-- here Str1: id, Str2: ifdef, Dist: 3 Str1: id, Str2: ifndef, Dist: 4 Str1: ifd, Str2: if, Dist: 1 Str1: ifd, Str2: ifdef, Dist: 2 Str1: ifd, Str2: ifndef, Dist: 3 |
clang/include/clang/Basic/DiagnosticLexKinds.td | ||
---|---|---|
362–363 |
CC @cjdb for awareness. (He's working on a project for improving the way in which we communicate diagnostics to users.) | |
clang/lib/Lex/PPDirectives.cpp | ||
484–492 |
We have some quite inconsistent const correctness in our interfaces, so declaring local objects as const sometimes requires viral changes to make things compile. (I think we've mostly excised the uses of const_cast that would cast away constness in situations which weren't guaranteed to be safe in.) | |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
11 |
I may be changing my mind on this a bit. I now see we don't issue an extension warning when using #elifdef or #elifndef in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306)
The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g., #elfindef // expected-warning {{invalid preprocessing directive}} \ expected-note {{did you mean '#elifdef'?}} \ expected-note {{did you mean '#elifndef'?}} WDYT? |
clang/include/clang/Basic/DiagnosticLexKinds.td | ||
---|---|---|
362–363 | Thank you for CCing him! You’re welcome! I currently think rustc's diagnostics like the following are clear to users rather than attaching a help to a warning like unknown directive found, did you mean #Hoge?. So, I think it would be better if we could implement diagnostics to be able to emit info like rustc does. https://reviews.llvm.org/D125078 Could I have your opinion here? | |
clang/lib/Lex/PPDirectives.cpp | ||
484–492 | Oh, I see thank you. This is a too difficult problem when working with many contributors though... | |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
11 |
Certainly, it would be less confusing to the user to avoid suggestions regarding those two. Thank you for submitting the issue, I also would like to work on it.
This is really cool, but don't you care about the redundancy of did you mean in terms of the llvm team culture? | |
llvm/include/llvm/ADT/StringRef.h | ||
267 ↗ | (On Diff #427263) | It seems my previous comment that I posted on deleted file disappeared. Could you please check the following link comment? |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
---|---|---|
11 |
+1, thank you!
I would care if the list were potentially unbounded (like, say, with identifiers in general), but because we know this list will only have a max of two entries on it in this case, it seems reasonable to me. I double-checked with @erichkeane to see if he thought it would be an issue, and he agreed that it being a fixed list makes it pretty reasonable. However, that discussion did raise a question -- why are there two suggestions? elifdef requires a swap + delete and elifndef requires just a swap, so we would have thought that it would have been the only option in the list. | |
llvm/include/llvm/ADT/StringRef.h | ||
267 ↗ | (On Diff #427263) | Thank you for calling this out, I had missed it! I'll respond here.
I don't think we should add another implementation of Levenshtein distances; we should stick with the base functionality, which is what's used by the existing typo correction logic: https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaLookup.cpp#L4308 Ideally (but I'm not asking you to do this because it's likely a very big ask), we'd generalize the TypoCorrectionConsumer and related classes so that it can be used during Lex, Parse, or Sema with callbacks from the typo correction consumer to obtain the list of potential names for correction. However, that functionality is so tightly tied to Sema, it may not be particularly plausible. One thing I noticed is that your implementation does not allow replacements, while the typo correction does. I think that likely explains the behavioral difference you're seeing between implementations. |
clang/lib/Lex/PPDirectives.cpp | ||
---|---|---|
486 | Both C and C++ have the new directives and we enable them as an extension in all language modes, so no need to care about it being in C mode specifically. However, I think we do need to care about *which* C and C++ mode we're in so that we only suggest the new directives in C2x and C++2b mode. |
clang/lib/Lex/PPDirectives.cpp | ||
---|---|---|
489 | Thank you. I replaced it with the actual type. | |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
11 | With the implementation of Lev distances used in llvm, I could simply suggest #elifdef from #elfidef and #elifndef from #elfindef. So, in this situation, do you think that we still need to add two notes for conflicted distances? | |
llvm/include/llvm/ADT/StringRef.h | ||
267 ↗ | (On Diff #427263) | Thank you. I updated the test to fit the existing implementation. |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
---|---|---|
4 | This still suggests that something's wrong as I would imagine this would have an edit distance of 1. Oh, interesting... setting the replacement option to false may have made things better for the elfindef case but worse for the id case? This is tricky because we want to identify things that are most likely simple typos but exclude things that may reasonably not be a typo but a custom preprocessor directive. Based on that, I *think* setting the replacement option to true gives the more conservative answer (it treats a replacement as 1 edit rather than 2). @erichkeane -- do you have thoughts? | |
11 | No, let's skip the two note behavior. If we find ourselves with multiple suggestions, we'll just leave off the "did you mean?" part of the diagnostic entirely. |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
---|---|---|
4 | Not particularly. I don't have a good hold of how much we want to suggest with the 'did you mean'. Line 9 and line 10 here are unfortunate, I would hope those would happen? Its unfortunate we don't have a way to figure out these common typos. | |
11 | Might I suggest we just emit the 'first' suggestion? This isn't really perfect, but I would think that our users put very little thought into it when we suggest the wrong thing. |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
---|---|---|
4 | Yeah, ideally I want this to "do what I'm thinking" and make all three situations work. :-D | |
11 | Notionally that makes sense, but the situation I'm worried about is when users pass -fixit to the compiler -- then it automatically does whatever, but we have no idea whether that whatever is reasonable or not when we have more than one option. |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
---|---|---|
11 | I could suggest #if from #id, but I had to change #elfindef to #elfinndef to suggest #elifndef. |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
---|---|---|
11 | Thanks! I'm coming around more to the idea that we'll never get the "perfect" set of suggestions, and so whatever suggestions we get are quite likely good enough. However, I think it would be helpful to also add the tests where the behavior seems a bit strange, like #elfindef, just to show that we know the behavior is what it is and it doesn't do bad things like crash or whatnot. | |
llvm/lib/Support/StringRef.cpp | ||
102 ↗ | (On Diff #428471) | I am less convinced that we want this as a general interface on StringRef. I think callers need to decide whether something is similar enough or not. For example, case sensitivity may not matter for this case but it might matter to another case. Others may wish to limit the max edit_distance for each of the candidates for performance reasons, others may not. We already saw there were questions about whether to allow replacements or not. That sort of thing. I think this functionality should be moved to PPDirectives.cpp as a static helper function that's specific to this use. Also, because this returns one of the candidates passed in, and those are a StringRef, I think this function should return a StringRef -- this removes allocation costs. I'd also drop the Dist parameter as the function is no longer intended to be a general purpose one. |
118–135 ↗ | (On Diff #428471) | I think we can skip the vector entirely by keeping track of the min edit distance we've seen thus far and which candidate that corresponds to when looping over the candidates. |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
---|---|---|
11 | Thank you :) Ok, I'll add the test as well. | |
llvm/lib/Support/StringRef.cpp | ||
102 ↗ | (On Diff #428471) | I am also in favor of it, but where should I put tests for this functionality? |
118–135 ↗ | (On Diff #428471) | 👍 |
llvm/lib/Support/StringRef.cpp | ||
---|---|---|
102 ↗ | (On Diff #428471) | This is something we wouldn't usually add tests for directly but instead test the behavior of through lit -- the tests you already have exercise this code path and would show if there's a situation where we expect a viable candidate and don't find any. So I'd not worry about test coverage for it in this case. |
Thank you for your support!
I updated the code, so could you please review this patch again?
llvm/lib/Support/StringRef.cpp | ||
---|---|---|
102 ↗ | (On Diff #428471) | I see. Thank you! |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
---|---|---|
37 | Here, #elfindef is suggested to #elifdef, not #elifndef, but I believe it will help many users to realize that they have typo'd #elifndef, or maybe they might have meant actually #elifdef. |
You should also come up with a release note for the changes.
clang/include/clang/Basic/DiagnosticLexKinds.td | ||
---|---|---|
434–435 | ||
clang/lib/Lex/PPDirectives.cpp | ||
488–490 | ||
1266–1267 | I think we should be attempting to suggest a typo for the error case as well e.g., #fi WHATEVER #endif we currently give no suggestion for that typo, just the error. However, this may require a fair amount of changes because of the various edge cases where we give better diagnostics than "unknown directive". e.g., #if WHATEVER // error: unterminated conditional directive #endfi // no diagnostic so if it looks like covering error cases is going to be involved, I'm fine doing it in a follow-up if you'd prefer. | |
clang/test/Preprocessor/suggest-typoed-directive.c | ||
37 | +1 |
Thank you for your review :)
clang/lib/Lex/PPDirectives.cpp | ||
---|---|---|
1266–1267 | The former can be implemented easily, but the latter seems not easy. |
clang/lib/Lex/PPDirectives.cpp | ||
---|---|---|
1266–1267 | I'm fine doing the error cases entirely in another patch. The current approach here is a problem because it issues a warning and an error for the same line of code. So I'd go back to the way the code was before, and in a follow-up you can handle errors. I think one approach to consider for that is having an argument to SuggestTypoedDirective() as to whether to err or warn so that we can reuse the same logic. |
clang/lib/Lex/PPDirectives.cpp | ||
---|---|---|
1266–1267 | OK, reverting 👍 I'll work on it. Thank you. |
I prepared a new public email address: vcs@kmatsui.me instead of the auto-generated one.
So, could you please use the new one for future commit?
LGTM, thank you for this! I adjusted the release note and commit message somewhat when landing to make sure they were detailed enough.
I see an instance of this warning in the Linux kernel due to the "Now, for unknown directives inside a skipped conditional block, we diagnose the unknown directive as a warning if it is sufficiently similar to a directive specific to preprocessor conditional blocks" part of this change:
arch/x86/crypto/aesni-intel_asm.S:532:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives] # in in order to perform ^ arch/x86/crypto/aesni-intel_asm.S:547:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives] # in in order to perform ^ 2 warnings generated.
This is a comment within an assembler file that will be preprocessed (this is a 32-bit x86 build and the block is guarded by #ifdef __x86_64__ on line 500):
Is there anything that can be done to improve this heuristic for this case? I can try to push a patch to change the comment style for this one instance but I always worry that a warning of this nature will appear in the future and result in the kernel disabling this warning entirely.
Oh wow, that's a really neat one!
This is a comment within an assembler file that will be preprocessed (this is a 32-bit x86 build and the block is guarded by #ifdef __x86_64__ on line 500):
Is there anything that can be done to improve this heuristic for this case? I can try to push a patch to change the comment style for this one instance but I always worry that a warning of this nature will appear in the future and result in the kernel disabling this warning entirely.
Ah, and we don't get an error for those because of special logic: https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/PPDirectives.cpp#L1243 and it looks like we may need similar logic before issuing the warnings as well.
Oops, thank you for your comment!
I would like to work on a hotfix patch for this issue.
Should we entirely opt-out of this suggestion on .S files? I think there are other approaches here, such as decreasing the max edit distance to avoid suggesting this case in .S files, but this approach is avoiding just this edge case so that it would possibly bring a new issue like this. Conversely, opting out of this suggestion on the whole .S files will not be able to catch any typoes. Considering possible edge cases (# directives are also treated as comments), I think opting out would be the only way, unfortunately. What do you think?
For now, I will create a patch opting out of this suggestion on .S files.
I believe the if (getLangOpts().AsmPreprocessor) condition is appropriately accurate here. I don't think changing the edit distance would be a good behavior for anyone, and I don't see any other appropriate heuristics.
Rather than adding a warning over the top of an existing error, I think we should modify err_pp_invalid_directive to have an optional "did you mean?" when we find a plausible typo to correct.
However, we do not issue that diagnostic when it's inside of a skipped conditional block, and that's what the crux of https://github.com/llvm/llvm-project/issues/51598 is about. As @rsmith pointed out in that issue (and I agree), compilers are free to support custom directives and those will validly appear inside of a conditional block that is skipped. We need to be careful not to diagnose those kinds of situations as an error. However, we still want to diagnose when the unknown directive is "sufficiently close" to another one which can control the conditional chain. e.g.,
Today, if FOO is defined to a nonzero value, you'll get diagnostics for all of those, but if FOO is not defined or is defined to 0, then there's no diagnostics. I think we want to consider directives that are *very likely* to be a typo (edit distance of 1, maybe 2?) for a conditional directive as a special case.
Currently, we only diagnose unknown directives as an error. I think for these special cased conditional directive diagnostics, we'll want to use a warning rather than an error in this circumstance (just in case it turns out to be a valid directive in a skipped conditional block). If we do go that route and make it a warning, I think the warning group should be -Wunknown-directives to mirror -Wunknown-pragmas, -Wunknown-attributes, etc and it should be defined to have the same text as the error case. e.g.,
WDYT?
(These were my thoughts before seeing the rest of the patch. After reading the patch, it looks like we have pretty similar ideas here, which is great, but leaving the comment anyway in case you have further opinions.)