This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/
-
include/clang/
-
clang/
-
Basic/
1/5
DiagnosticLexKinds.td
-
Lex/
-
Preprocessor.h
-
lib/Lex/
-
Lex/
4/13
PPDirectives.cpp
-
test/Preprocessor/
-
Preprocessor/
-
suggest-typoed-directive-c2x-cpp2b.c
2/18
suggest-typoed-directive.c
-
llvm/
-
include/llvm/ADT/
-
llvm/
-
ADT/
3
StringRef.h
-
lib/Support/
-
Support/
2/6
StringRef.cpp
-
unittests/ADT/
-
ADT/
-
StringRefTest.cpp

Differential D124726

Suggest typoed directives in preprocessor conditionals
ClosedPublic

Authored by ken-matsui on Apr 30 2022, 8:34 PM.

Download Raw Diff

Details

Reviewers

aaron.ballman

Commits

rGa247ba9d1563: Suggest typo corrections for preprocessor directives

Summary

This patch implements suggestion for typoed directives in preprocessor conditionals. The related issue is: https://github.com/llvm/llvm-project/issues/51598.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	60,050 ms	x64 debian > ThreadSanitizer-x86_64.ThreadSanitizer-x86_64::restore_stack.cpp
	60,090 ms	x64 debian > libFuzzer.libFuzzer::large.test

Event Timeline

ken-matsui created this revision.Apr 30 2022, 8:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2022, 8:34 PM

Herald added a subscriber: mgorny. · View Herald Transcript

ken-matsui requested review of this revision.Apr 30 2022, 8:34 PM

Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2022, 8:34 PM

Herald added a subscriber: cfe-commits. · View Herald Transcript

Harbormaster completed remote builds in B162132: Diff 426262.Apr 30 2022, 8:38 PM

ken-matsui added inline comments.Apr 30 2022, 8:50 PM

clang/test/OpenMP/predefined_macro.c
7 ↗	(On Diff #426262)	I am not sure if this typo was intended. When I renamed `elsif` to `elif`, `#error "_OPENMP has incorrect value"` on line `13` was evaluated. Therefore, if this typo was intended, just suppressing the warning with `expected-warning` would be better. However, if this typo was NOT intended, I think I should make `_OPENMP` equal to `201107`. It is also possible that this test was mistakenly written.

Thank you for working on this!

clang/include/clang/Basic/DiagnosticGroups.td
1023–1024 ↗	(On Diff #426262)	We don't typically use "typo" in the user-facing part of diagnostics and this group doesn't seem likely to be reused, so I would remove it (another comment on this elsewhere).
clang/include/clang/Basic/DiagnosticLexKinds.td
362–363	Rather than adding a warning over the top of an existing error, I think we should modify `err_pp_invalid_directive` to have an optional "did you mean?" when we find a plausible typo to correct. However, we do not issue that diagnostic when it's inside of a skipped conditional block, and that's what the crux of https://github.com/llvm/llvm-project/issues/51598 is about. As @rsmith pointed out in that issue (and I agree), compilers are free to support custom directives and those will validly appear inside of a conditional block that is skipped. We need to be careful not to diagnose those kinds of situations as an error. However, we still want to diagnose when the unknown directive is "sufficiently close" to another one which can control the conditional chain. e.g., #fi FOO // error unknown directive, did you mean #if? #endfi // error unknown directive, did you mean #endif? #if FOO #esle // diag: unknown directive, did you mean #else? #elfi // diag: unknown directive, did you mean #elif? #elfidef // diag: unknown directive, did you mean #elifdef #elinfdef // diag: unknown directive, did you mean #elifndef #frobble // No diagnostic, not close enough to a conditional directive to warrant diagnosing #eerp // No diagnostic, not close enough to a conditional directive to warrant diagnosing #endif Today, if `FOO` is defined to a nonzero value, you'll get diagnostics for all of those, but if `FOO` is not defined or is defined to 0, then there's no diagnostics. I think we want to consider directives that are very likely to be a typo (edit distance of 1, maybe 2?) for a conditional directive as a special case. Currently, we only diagnose unknown directives as an error. I think for these special cased conditional directive diagnostics, we'll want to use a warning rather than an error in this circumstance (just in case it turns out to be a valid directive in a skipped conditional block). If we do go that route and make it a warning, I think the warning group should be `-Wunknown-directives` to mirror `-Wunknown-pragmas`, `-Wunknown-attributes`, etc and it should be defined to have the same text as the error case. e.g., def err_pp_invalid_directive : Error< "invalid preprocessing directive%select{\|; did you mean '#%1'?}0" >; def warn_pp_invalid_directive : Warning< err_pp_invalid_directive.Text>, InGroup<DiagGroup<"unknown-directives">>; WDYT? (These were my thoughts before seeing the rest of the patch. After reading the patch, it looks like we have pretty similar ideas here, which is great, but leaving the comment anyway in case you have further opinions.)
clang/include/clang/Tooling/LevDistance.h
1 ↗	(On Diff #426262)	You shouldn't need this file or the implementation file -- we have this functionality already in: https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/edit_distance.h and https://github.com/llvm/llvm-project/blob/main/llvm/include/llvm/ADT/StringRef.h#L240.
clang/lib/Lex/PPDirectives.cpp
439–447	Mostly just cleaning up for coding conventions, but also, no need to use a `std::array` and we typically don't use local top-level `const` qualification.
clang/test/OpenMP/predefined_macro.c
7 ↗	(On Diff #426262)	I tracked down the root cause here and fixed the bug in 50b51b1860acbfb775d5e2eee3310e25c635d667, so when you rebase on main you won't have to deal with this any longer. Good catch!
clang/test/Preprocessor/suggest-typoed-directive.c
2	I'd drop the `-pedantic` from here, then we don't need the expected warning on the last line. I think you should drop the `-std=c99` as well so there's a RUN line that's not tied to a specific release, then I'd add a RUN line explicitly for c2x.
11	It's interesting that this one suggested `#elifdef` instead of `#elifndef` -- is there anything that can be done for that? Also, one somewhat interesting question is whether we want to recommend `#elifdef` and `#elifndef` outside of C2x/C++2b mode. Those directives only exist in the latest language standard, but Clang supports them as a conforming extension in all language modes. Given that this diagnostic is about typos, I think I'm okay suggesting the directives even in older language modes. That's as likely to be a correct suggestion as not, IMO.

Thank you so much for your clear review!

clang/include/clang/Basic/DiagnosticGroups.td
1023–1024 ↗	(On Diff #426262)	Ah, I see. Thank you!
clang/include/clang/Basic/DiagnosticLexKinds.td
362–363	Currently, we only diagnose unknown directives as an error. I think for these special cased conditional directive diagnostics, we'll want to use a warning rather than an error in this circumstance (just in case it turns out to be a valid directive in a skipped conditional block). If we do go that route and make it a warning, I think the warning group should be `-Wunknown-directives` to mirror `-Wunknown-pragmas`, `-Wunknown-attributes`, etc and it should be defined to have the same text as the error case. e.g., def err_pp_invalid_directive : Error< "invalid preprocessing directive%select{\|; did you mean '#%1'?}0" >; def warn_pp_invalid_directive : Warning< err_pp_invalid_directive.Text>, InGroup<DiagGroup<"unknown-directives">>; WDYT? (These were my thoughts before seeing the rest of the patch. After reading the patch, it looks like we have pretty similar ideas here, which is great, but leaving the comment anyway in case you have further opinions.) For now, I totally agree with deriving a new warning from `err_pp_invalid_directive`. However, for future scalability, I think it would be great if we could split those diagnostics into Error & Warning and Help, for example. Rustc does split the diagnostics like the following, and I think this is quite clear. So, a bit apart from this patch, I speculate creating a diagnostic system that can split them would bring Clang diagnostics much more readable. https://github.com/rust-lang/rust/blob/598d89bf142823b5d84e2eb0f0f9e418ee966a4b/src/test/ui/suggestions/suggest-trait-items.stderr
clang/include/clang/Tooling/LevDistance.h
1 ↗	(On Diff #426262)	I totally missed the implementations! Sorry. But where should I put both `findSimilarStr` & `findSimilarStr`? It seems that their same implementations aren't there.
clang/lib/Lex/PPDirectives.cpp
439–447	Thank you! Just wondering, but is there any reason not to use the `const` qualifier?
clang/test/OpenMP/predefined_macro.c
7 ↗	(On Diff #426262)	Thank you for fixing the bug! I could confirm the bug was fixed upstream.
clang/test/Preprocessor/suggest-typoed-directive.c
11	It's interesting that this one suggested `#elifdef` instead of `#elifndef` -- is there anything that can be done for that? I found I have to use `std::min_element` instead of `std::max_element` because we are finding the nearest (most minimum distance) string. Meanwhile, `#elfindef` has 2 distance with both `#elifdef` and `#elifndef`. After replacing `std::max_element` with `std::min_element`, I could suggest `#elifndef` from `#elfinndef`. Also, one somewhat interesting question is whether we want to recommend `#elifdef` and `#elifndef` outside of C2x/C++2b mode. Those directives only exist in the latest language standard, but Clang supports them as a conforming extension in all language modes. Given that this diagnostic is about typos, I think I'm okay suggesting the directives even in older language modes. That's as likely to be a correct suggestion as not, IMO. I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like `this is a C2x/C++2b feature but compilable on Clang`?) According to the algorithm of `std::min_element`, we only get an iterator of the first element even if there is another element that has the same distance. So, `#elfindef` only suggests `#elifdef` in accordance with the order of `Candidates`, and I don't think it is beautiful to depend on the order of candidates. I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable: #elfindef // diag: unknown directive, did you mean #elifdef or #elifndef?

ken-matsui added inline comments.May 5 2022, 1:13 AM

clang/include/clang/Tooling/LevDistance.h
1 ↗	(On Diff #426262)	@aaron.ballman The computation results are not matched with my implementation and another language implementation. So, several directives that could be suggested in my implementation are no longer possible to be suggested, such as `#id` -> `#if`, `#elid` -> `#elif`, and `#elsi` -> `#else`, when using `llvm::ComputeEditDistance`. The implementation of `llvm::ComputeEditDistance` might be also correct, but some distances seem to be calculated longer, which causes fewer suggestions. Should I keep going with this implementation or not? llvm::ComputeEditDistance: size_t Dist = Str1.edit_distance(Str2, false); Str1: id, Str2: if, Dist: 2 # <-- here Str1: id, Str2: ifdef, Dist: 3 Str1: id, Str2: ifndef, Dist: 4 Str1: ifd, Str2: if, Dist: 1 Str1: ifd, Str2: ifdef, Dist: 2 Str1: ifd, Str2: ifndef, Dist: 3 My implementation: Str1: id, Str2: if, Dist: 1 # <-- here Str1: id, Str2: ifdef, Dist: 3 Str1: id, Str2: ifndef, Dist: 4 Str1: ifd, Str2: if, Dist: 1 Str1: ifd, Str2: ifdef, Dist: 2 Str1: ifd, Str2: ifndef, Dist: 3 Rustc implementation: Str1: id, Str2: if, Dist: 1 # <-- here Str1: id, Str2: ifdef, Dist: 3 Str1: id, Str2: ifndef, Dist: 4 Str1: ifd, Str2: if, Dist: 1 Str1: ifd, Str2: ifdef, Dist: 2 Str1: ifd, Str2: ifndef, Dist: 3

Update codes as reviewed

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 3:52 AM

Herald added a subscriber: llvm-commits. · View Herald Transcript

Remove unnecessary includes

Harbormaster completed remote builds in B162873: Diff 427263.May 5 2022, 6:46 AM

aaron.ballman added a subscriber: cjdb.May 6 2022, 5:12 AM

aaron.ballman added inline comments.

clang/include/clang/Basic/DiagnosticLexKinds.td
362–363	However, for future scalability, I think it would be great if we could split those diagnostics into Error & Warning and Help, for example. Rustc does split the diagnostics like the following, and I think this is quite clear. So, a bit apart from this patch, I speculate creating a diagnostic system that can split them would bring Clang diagnostics much more readable. CC @cjdb for awareness. (He's working on a project for improving the way in which we communicate diagnostics to users.)
clang/lib/Lex/PPDirectives.cpp
439–447	Just wondering, but is there any reason not to use the const qualifier? We have some quite inconsistent const correctness in our interfaces, so declaring local objects as const sometimes requires viral changes to make things compile. (I think we've mostly excised the uses of `const_cast` that would cast away constness in situations which weren't guaranteed to be safe in.)
clang/test/Preprocessor/suggest-typoed-directive.c
11	I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like this is a C2x/C++2b feature but compilable on Clang?) I may be changing my mind on this a bit. I now see we don't issue an extension warning when using `#elifdef` or `#elifndef` in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306) I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable: The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g., #elfindef // expected-warning {{invalid preprocessing directive}} \ expected-note {{did you mean '#elifdef'?}} \ expected-note {{did you mean '#elifndef'?}} WDYT?

Thanks for working on this!

ken-matsui added inline comments.May 7 2022, 2:03 AM

clang/include/clang/Basic/DiagnosticLexKinds.td
362–363	Thank you for CCing him! In D124726#3497284, @cjdb wrote: Thanks for working on this! @cjdb You’re welcome! I currently think rustc's diagnostics like the following are clear to users rather than attaching a help to a warning like `unknown directive found, did you mean #Hoge?`. https://github.com/rust-lang/rust/blob/598d89bf142823b5d84e2eb0f0f9e418ee966a4b/src/test/ui/suggestions/suggest-trait-items.stderr So, I think it would be better if we could implement diagnostics to be able to emit info like rustc does. I also implemented a feature to show line numbers in diagnostics, and it may be under review. https://reviews.llvm.org/D125078 Could I have your opinion here?
clang/lib/Lex/PPDirectives.cpp
439–447	Oh, I see thank you. This is a too difficult problem when working with many contributors though...
clang/test/Preprocessor/suggest-typoed-directive.c
11	I may be changing my mind on this a bit. I now see we don't issue an extension warning when using `#elifdef` or `#elifndef` in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306) Certainly, it would be less confusing to the user to avoid suggestions regarding those two. I'm going to fix my code to avoid suggesting them in not C2x mode. Thank you for submitting the issue, I also would like to work on it. The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g., #elfindef // expected-warning {{invalid preprocessing directive}} \ expected-note {{did you mean '#elifdef'?}} \ expected-note {{did you mean '#elifndef'?}} WDYT? This is really cool, but don't you care about the redundancy of `did you mean` in terms of the llvm team culture? If not, I will implement notes like the above.
llvm/include/llvm/ADT/StringRef.h
267	It seems my previous comment that I posted on deleted file disappeared. Could you please check the following link comment? https://reviews.llvm.org/D124726#3493222

aaron.ballman added a subscriber: erichkeane.May 9 2022, 8:34 AM

aaron.ballman added inline comments.

clang/test/Preprocessor/suggest-typoed-directive.c
11	Certainly, it would be less confusing to the user to avoid suggestions regarding those two. I'm going to fix my code to avoid suggesting them in not C2x mode. +1, thank you! This is really cool, but don't you care about the redundancy of did you mean in terms of the llvm team culture? If not, I will implement notes like the above. I would care if the list were potentially unbounded (like, say, with identifiers in general), but because we know this list will only have a max of two entries on it in this case, it seems reasonable to me. I double-checked with @erichkeane to see if he thought it would be an issue, and he agreed that it being a fixed list makes it pretty reasonable. However, that discussion did raise a question -- why are there two suggestions? elifdef requires a swap + delete and elifndef requires just a swap, so we would have thought that it would have been the only option in the list.
llvm/include/llvm/ADT/StringRef.h
267	Thank you for calling this out, I had missed it! I'll respond here. Should I keep going with this implementation or not? I don't think we should add another implementation of Levenshtein distances; we should stick with the base functionality, which is what's used by the existing typo correction logic: https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaLookup.cpp#L4308 Ideally (but I'm not asking you to do this because it's likely a very big ask), we'd generalize the `TypoCorrectionConsumer` and related classes so that it can be used during Lex, Parse, or Sema with callbacks from the typo correction consumer to obtain the list of potential names for correction. However, that functionality is so tightly tied to Sema, it may not be particularly plausible. One thing I noticed is that your implementation does not allow replacements, while the typo correction does. I think that likely explains the behavioral difference you're seeing between implementations.

erichkeane added inline comments.May 9 2022, 8:40 AM

clang/lib/Lex/PPDirectives.cpp
441	Should this be dependent on C modes? it would be kind of silly to suggest "invalid preprocessing token elifndef, did you mean elifndef"?
444	I don't believe this meets our requirements for 'auto'.

aaron.ballman added inline comments.May 9 2022, 8:53 AM

clang/lib/Lex/PPDirectives.cpp
441	Both C and C++ have the new directives and we enable them as an extension in all language modes, so no need to care about it being in C mode specifically. However, I think we do need to care about which C and C++ mode we're in so that we only suggest the new directives in C2x and C++2b mode.

Updated the code as reviewed

Herald added a subscriber: hiraditya. · View Herald TranscriptMay 10 2022, 4:18 AM

ken-matsui added inline comments.May 10 2022, 4:18 AM

clang/lib/Lex/PPDirectives.cpp
444	Thank you. I replaced it with the actual type.
clang/test/Preprocessor/suggest-typoed-directive.c
11	With the implementation of Lev distances used in llvm, I could simply suggest `#elifdef` from `#elfidef` and `#elifndef` from `#elfindef`. So, in this situation, do you think that we still need to add two notes for conflicted distances?
llvm/include/llvm/ADT/StringRef.h
267	Thank you. I updated the test to fit the existing implementation. The suggestion for `#id` couldn't be suggested as `#if`.

Harbormaster completed remote builds in B163668: Diff 428335.May 10 2022, 6:00 AM

Fix the test for FindSimilarStr

aaron.ballman added inline comments.May 10 2022, 9:29 AM

clang/test/Preprocessor/suggest-typoed-directive.c
3	This still suggests that something's wrong as I would imagine this would have an edit distance of 1. Oh, interesting... setting the replacement option to `false` may have made things better for the `elfindef` case but worse for the `id` case? This is tricky because we want to identify things that are most likely simple typos but exclude things that may reasonably not be a typo but a custom preprocessor directive. Based on that, I think setting the replacement option to `true` gives the more conservative answer (it treats a replacement as 1 edit rather than 2). @erichkeane -- do you have thoughts?
11	No, let's skip the two note behavior. If we find ourselves with multiple suggestions, we'll just leave off the "did you mean?" part of the diagnostic entirely.

erichkeane added inline comments.May 10 2022, 9:36 AM

clang/test/Preprocessor/suggest-typoed-directive.c
3	Not particularly. I don't have a good hold of how much we want to suggest with the 'did you mean'. Line 9 and line 10 here are unfortunate, I would hope those would happen? Its unfortunate we don't have a way to figure out these common typos.
11	Might I suggest we just emit the 'first' suggestion? This isn't really perfect, but I would think that our users put very little thought into it when we suggest the wrong thing.

aaron.ballman added inline comments.May 10 2022, 10:00 AM

clang/test/Preprocessor/suggest-typoed-directive.c
3	Yeah, ideally I want this to "do what I'm thinking" and make all three situations work. :-D
11	Notionally that makes sense, but the situation I'm worried about is when users pass `-fixit` to the compiler -- then it automatically does whatever, but we have no idea whether that whatever is reasonable or not when we have more than one option.

Harbormaster completed remote builds in B163710: Diff 428395.May 10 2022, 10:02 AM

Set AllowReplacements to true

ken-matsui added inline comments.May 10 2022, 12:57 PM

clang/test/Preprocessor/suggest-typoed-directive.c
11	I could suggest `#if` from `#id`, but I had to change `#elfindef` to `#elfinndef` to suggest `#elifndef`. After setting the `AllowReplacements` option to `true`, `#elfindef` suggested `#elifdef` (I think it might have the same distance with `#elifndef`).

Harbormaster completed remote builds in B163762: Diff 428471.May 10 2022, 2:22 PM

aaron.ballman added inline comments.May 11 2022, 4:59 AM

clang/test/Preprocessor/suggest-typoed-directive.c
11	Thanks! I'm coming around more to the idea that we'll never get the "perfect" set of suggestions, and so whatever suggestions we get are quite likely good enough. However, I think it would be helpful to also add the tests where the behavior seems a bit strange, like `#elfindef`, just to show that we know the behavior is what it is and it doesn't do bad things like crash or whatnot.
llvm/lib/Support/StringRef.cpp
102	I am less convinced that we want this as a general interface on `StringRef`. I think callers need to decide whether something is similar enough or not. For example, case sensitivity may not matter for this case but it might matter to another case. Others may wish to limit the max edit_distance for each of the candidates for performance reasons, others may not. We already saw there were questions about whether to allow replacements or not. That sort of thing. I think this functionality should be moved to PPDirectives.cpp as a static helper function that's specific to this use. Also, because this returns one of the candidates passed in, and those are a `StringRef`, I think this function should return a `StringRef` -- this removes allocation costs. I'd also drop the `Dist` parameter as the function is no longer intended to be a general purpose one.
118–135	I think we can skip the vector entirely by keeping track of the min edit distance we've seen thus far and which candidate that corresponds to when looping over the candidates.

ken-matsui added inline comments.May 11 2022, 6:49 AM

clang/test/Preprocessor/suggest-typoed-directive.c
11	Thank you :) Ok, I'll add the test as well.
llvm/lib/Support/StringRef.cpp
102	I am also in favor of it, but where should I put tests for this functionality? Is it possible to write unit tests for static functions implemented in `.cpp` files?
118–135	👍

aaron.ballman added inline comments.May 11 2022, 9:33 AM

llvm/lib/Support/StringRef.cpp
102	This is something we wouldn't usually add tests for directly but instead test the behavior of through lit -- the tests you already have exercise this code path and would show if there's a situation where we expect a viable candidate and don't find any. So I'd not worry about test coverage for it in this case.

Update the code as reviewed

Thank you for your support!

I updated the code, so could you please review this patch again?

llvm/lib/Support/StringRef.cpp
102	I see. Thank you!

Remove unused includes in StringRef.h

ken-matsui added inline comments.May 11 2022, 6:27 PM

clang/test/Preprocessor/suggest-typoed-directive.c
37	Here, `#elfindef` is suggested to `#elifdef`, not `#elifndef`, but I believe it will help many users to realize that they have typo'd `#elifndef`, or maybe they might have meant actually `#elifdef`.

Harbormaster completed remote builds in B164023: Diff 428827.May 11 2022, 10:29 PM

You should also come up with a release note for the changes.

clang/include/clang/Basic/DiagnosticLexKinds.td
431–432
clang/lib/Lex/PPDirectives.cpp
443–445
1212	I think we should be attempting to suggest a typo for the error case as well e.g., #fi WHATEVER #endif we currently give no suggestion for that typo, just the error. However, this may require a fair amount of changes because of the various edge cases where we give better diagnostics than "unknown directive". e.g., #if WHATEVER // error: unterminated conditional directive #endfi // no diagnostic so if it looks like covering error cases is going to be involved, I'm fine doing it in a follow-up if you'd prefer.
clang/test/Preprocessor/suggest-typoed-directive.c
37	+1

Thank you for your review :)

clang/lib/Lex/PPDirectives.cpp
1212	The former can be implemented easily, but the latter seems not easy. So what about doing the latter in another patch?

Update the code as reviewed and add a release note

Add test for errored directive but no suggestion

aaron.ballman added inline comments.May 12 2022, 10:29 AM

clang/lib/Lex/PPDirectives.cpp
1212	I'm fine doing the error cases entirely in another patch. The current approach here is a problem because it issues a warning and an error for the same line of code. So I'd go back to the way the code was before, and in a follow-up you can handle errors. I think one approach to consider for that is having an argument to `SuggestTypoedDirective()` as to whether to err or warn so that we can reuse the same logic.

ken-matsui added inline comments.May 12 2022, 10:41 AM

clang/lib/Lex/PPDirectives.cpp
1212	OK, reverting 👍 I'll work on it. Thank you.

Revert the changes for errored directives

Harbormaster completed remote builds in B164148: Diff 429005.May 12 2022, 12:53 PM

ken-matsui added a comment.May 13 2022, 1:24 AM

This comment was removed by ken-matsui.

@aaron.ballman

I prepared a new public email address: vcs@kmatsui.me instead of the auto-generated one.
So, could you please use the new one for future commit?

This revision was not accepted when it landed; it landed in state Needs Review.May 13 2022, 6:17 AM

This revision was landed with ongoing or failed builds.

Closed by commit rGa247ba9d1563: Suggest typo corrections for preprocessor directives (authored by ken-matsui, committed by aaron.ballman). · Explain Why

This revision was automatically updated to reflect the committed changes.

aaron.ballman added a commit: rGa247ba9d1563: Suggest typo corrections for preprocessor directives.

LGTM, thank you for this! I adjusted the release note and commit message somewhat when landing to make sure they were detailed enough.

Thank you for your support!

I see an instance of this warning in the Linux kernel due to the "Now, for unknown directives inside a skipped conditional block, we diagnose the unknown directive as a warning if it is sufficiently similar to a directive specific to preprocessor conditional blocks" part of this change:

arch/x86/crypto/aesni-intel_asm.S:532:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
arch/x86/crypto/aesni-intel_asm.S:547:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
2 warnings generated.

This is a comment within an assembler file that will be preprocessed (this is a 32-bit x86 build and the block is guarded by #ifdef __x86_64__ on line 500):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/crypto/aesni-intel_asm.S?h=v5.18-rc7#n532

Is there anything that can be done to improve this heuristic for this case? I can try to push a patch to change the comment style for this one instance but I always worry that a warning of this nature will appear in the future and result in the kernel disabling this warning entirely.

In D124726#3516346, @nathanchance wrote:
I see an instance of this warning in the Linux kernel due to the "Now, for unknown directives inside a skipped conditional block, we diagnose the unknown directive as a warning if it is sufficiently similar to a directive specific to preprocessor conditional blocks" part of this change:
arch/x86/crypto/aesni-intel_asm.S:532:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
arch/x86/crypto/aesni-intel_asm.S:547:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
2 warnings generated.

Oh wow, that's a really neat one!

This is a comment within an assembler file that will be preprocessed (this is a 32-bit x86 build and the block is guarded by #ifdef __x86_64__ on line 500):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/crypto/aesni-intel_asm.S?h=v5.18-rc7#n532

Is there anything that can be done to improve this heuristic for this case? I can try to push a patch to change the comment style for this one instance but I always worry that a warning of this nature will appear in the future and result in the kernel disabling this warning entirely.

Ah, and we don't get an error for those because of special logic: https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/PPDirectives.cpp#L1243 and it looks like we may need similar logic before issuing the warnings as well.

@nathanchance

Oops, thank you for your comment!
I would like to work on a hotfix patch for this issue.

@aaron.ballman

Should we entirely opt-out of this suggestion on .S files? I think there are other approaches here, such as decreasing the max edit distance to avoid suggesting this case in .S files, but this approach is avoiding just this edge case so that it would possibly bring a new issue like this. Conversely, opting out of this suggestion on the whole .S files will not be able to catch any typoes. Considering possible edge cases (# directives are also treated as comments), I think opting out would be the only way, unfortunately. What do you think?

For now, I will create a patch opting out of this suggestion on .S files.

In D124726#3516896, @ken-matsui wrote:

@nathanchance

Oops, thank you for your comment!
I would like to work on a hotfix patch for this issue.

@aaron.ballman

Should we entirely opt-out of this suggestion on .S files? I think there are other approaches here, such as decreasing the max edit distance to avoid suggesting this case in .S files, but this approach is avoiding just this edge case so that it would possibly bring a new issue like this. Conversely, opting out of this suggestion on the whole .S files will not be able to catch any typoes. Considering possible edge cases (# directives are also treated as comments), I think opting out would be the only way, unfortunately. What do you think?

For now, I will create a patch opting out of this suggestion on .S files.

I believe the if (getLangOpts().AsmPreprocessor) condition is appropriately accurate here. I don't think changing the edit distance would be a good behavior for anyone, and I don't see any other appropriate heuristics.

ken-matsui mentioned this in D125727: [clang] Avoid suggesting typoed directives in `.S` files.May 16 2022, 2:26 PM

I see. Thank you. I fixed the issue using its workaround.

https://reviews.llvm.org/D125727

nickdesaulniers mentioned this in rG45e01ce5fe6a: [clang] Avoid suggesting typoed directives in `.S` files.May 16 2022, 3:47 PM

Revision Contents

Path

Size

clang/

include/

clang/

Basic/

DiagnosticLexKinds.td

7 lines

Lex/

Preprocessor.h

5 lines

lib/

Lex/

PPDirectives.cpp

29 lines

test/

Preprocessor/

suggest-typoed-directive-c2x-cpp2b.c

29 lines

suggest-typoed-directive.c

28 lines

llvm/

include/

llvm/

ADT/

StringRef.h

19 lines

lib/

Support/

StringRef.cpp

37 lines

unittests/

ADT/

StringRefTest.cpp

29 lines

Diff 428395

clang/include/clang/Basic/DiagnosticLexKinds.td

Show First 20 Lines • Show All 353 Lines • ▼ Show 20 Lines

def pp_invalid_string_literal : Warning< def pp_invalid_string_literal : Warning<

"invalid string literal, ignoring final '\\'">; "invalid string literal, ignoring final '\\'">;

def warn_pp_expr_overflow : Warning< def warn_pp_expr_overflow : Warning<

"integer overflow in preprocessor expression">; "integer overflow in preprocessor expression">;

def warn_pp_convert_to_positive : Warning< def warn_pp_convert_to_positive : Warning<

"%select{left|right}0 side of operator converted from negative value to " "%select{left|right}0 side of operator converted from negative value to "

"unsigned: %1">; "unsigned: %1">;

def ext_pp_import_directive : Extension<"#import is a language extension">, def ext_pp_import_directive : Extension<"#import is a language extension">,

aaron.ballmanUnsubmitted

Not Done

Rather than adding a warning over the top of an existing error, I think we should modify err_pp_invalid_directive to have an optional "did you mean?" when we find a plausible typo to correct.

However, we do not issue that diagnostic when it's inside of a skipped conditional block, and that's what the crux of https://github.com/llvm/llvm-project/issues/51598 is about. As @rsmith pointed out in that issue (and I agree), compilers are free to support custom directives and those will validly appear inside of a conditional block that is skipped. We need to be careful not to diagnose those kinds of situations as an error. However, we still want to diagnose when the unknown directive is "sufficiently close" to another one which can control the conditional chain. e.g.,

#fi FOO // error unknown directive, did you mean #if?
#endfi // error unknown directive, did you mean #endif?

#if FOO
#esle // diag: unknown directive, did you mean #else?
#elfi // diag: unknown directive, did you mean #elif?
#elfidef // diag: unknown directive, did you mean #elifdef
#elinfdef // diag: unknown directive, did you mean #elifndef

#frobble // No diagnostic, not close enough to a conditional directive to warrant diagnosing
#eerp // No diagnostic, not close enough to a conditional directive to warrant diagnosing

#endif

Today, if FOO is defined to a nonzero value, you'll get diagnostics for all of those, but if FOO is not defined or is defined to 0, then there's no diagnostics. I think we want to consider directives that are *very likely* to be a typo (edit distance of 1, maybe 2?) for a conditional directive as a special case.

Currently, we only diagnose unknown directives as an error. I think for these special cased conditional directive diagnostics, we'll want to use a warning rather than an error in this circumstance (just in case it turns out to be a valid directive in a skipped conditional block). If we do go that route and make it a warning, I think the warning group should be -Wunknown-directives to mirror -Wunknown-pragmas, -Wunknown-attributes, etc and it should be defined to have the same text as the error case. e.g.,

def err_pp_invalid_directive : Error<
  "invalid preprocessing directive%select{|; did you mean '#%1'?}0"
>;
def warn_pp_invalid_directive : Warning<
  err_pp_invalid_directive.Text>, InGroup<DiagGroup<"unknown-directives">>;

WDYT?

(These were my thoughts before seeing the rest of the patch. After reading the patch, it looks like we have pretty similar ideas here, which is great, but leaving the comment anyway in case you have further opinions.)

aaron.ballman: Rather than adding a warning over the top of an existing error, I think we should modify…

ken-matsuiAuthorUnsubmitted

Not Done

Currently, we only diagnose unknown directives as an error. I think for these special cased conditional directive diagnostics, we'll want to use a warning rather than an error in this circumstance (just in case it turns out to be a valid directive in a skipped conditional block). If we do go that route and make it a warning, I think the warning group should be -Wunknown-directives to mirror -Wunknown-pragmas, -Wunknown-attributes, etc and it should be defined to have the same text as the error case. e.g.,
def err_pp_invalid_directive : Error<
  "invalid preprocessing directive%select{|; did you mean '#%1'?}0"
>;
def warn_pp_invalid_directive : Warning<
  err_pp_invalid_directive.Text>, InGroup<DiagGroup<"unknown-directives">>;
WDYT?

(These were my thoughts before seeing the rest of the patch. After reading the patch, it looks like we have pretty similar ideas here, which is great, but leaving the comment anyway in case you have further opinions.)

For now, I totally agree with deriving a new warning from err_pp_invalid_directive.

However, for future scalability, I think it would be great if we could split those diagnostics into Error & Warning and Help, for example. Rustc does split the diagnostics like the following, and I think this is quite clear. So, a bit apart from this patch, I speculate creating a diagnostic system that can split them would bring Clang diagnostics much more readable.

https://github.com/rust-lang/rust/blob/598d89bf142823b5d84e2eb0f0f9e418ee966a4b/src/test/ui/suggestions/suggest-trait-items.stderr

ken-matsui: > Currently, we only diagnose unknown directives as an error. I think for these special cased…

aaron.ballmanUnsubmitted

Done

However, for future scalability, I think it would be great if we could split those diagnostics into Error & Warning and Help, for example. Rustc does split the diagnostics like the following, and I think this is quite clear. So, a bit apart from this patch, I speculate creating a diagnostic system that can split them would bring Clang diagnostics much more readable.

CC @cjdb for awareness. (He's working on a project for improving the way in which we communicate diagnostics to users.)

aaron.ballman: > However, for future scalability, I think it would be great if we could split those…

ken-matsuiAuthorUnsubmitted

Not Done

Thank you for CCing him!

In D124726#3497284, @cjdb wrote:

Thanks for working on this!

@cjdb

You’re welcome!

I currently think rustc's diagnostics like the following are clear to users rather than attaching a help to a warning like unknown directive found, did you mean #Hoge?.

https://github.com/rust-lang/rust/blob/598d89bf142823b5d84e2eb0f0f9e418ee966a4b/src/test/ui/suggestions/suggest-trait-items.stderr

So, I think it would be better if we could implement diagnostics to be able to emit info like rustc does.
I also implemented a feature to show line numbers in diagnostics, and it may be under review.

https://reviews.llvm.org/D125078

Could I have your opinion here?

ken-matsui: Thank you for CCing him! >>! In D124726#3497284, @cjdb wrote: > Thanks for working on this!

InGroup<DiagGroup<"import-preprocessor-directive-pedantic">>; InGroup<DiagGroup<"import-preprocessor-directive-pedantic">>;

def err_pp_import_directive_ms : Error< def err_pp_import_directive_ms : Error<

"#import of type library is an unsupported Microsoft feature">; "#import of type library is an unsupported Microsoft feature">;

def ext_pp_include_search_ms : ExtWarn< def ext_pp_include_search_ms : ExtWarn<

"#include resolved using non-portable Microsoft search rules as: %0">, "#include resolved using non-portable Microsoft search rules as: %0">,

InGroup<MicrosoftInclude>; InGroup<MicrosoftInclude>;

def ext_pp_ident_directive : Extension<"#ident is a language extension">; def ext_pp_ident_directive : Extension<"#ident is a language extension">;

▲ Show 20 Lines • Show All 50 Lines • ▼ Show 20 Lines def warn_cxx98_compat_empty_fnmacro_arg : Warning<

"empty macro arguments are incompatible with C++98">, "empty macro arguments are incompatible with C++98">,

InGroup<CXX98CompatPedantic>, DefaultIgnore; InGroup<CXX98CompatPedantic>, DefaultIgnore;

def note_macro_here : Note<"macro %0 defined here">; def note_macro_here : Note<"macro %0 defined here">;

def note_macro_expansion_here : Note<"expansion of macro %0 requested here">; def note_macro_expansion_here : Note<"expansion of macro %0 requested here">;

def ext_pp_opencl_variadic_macros : Extension< def ext_pp_opencl_variadic_macros : Extension<

"variadic macros are a Clang extension in OpenCL">; "variadic macros are a Clang extension in OpenCL">;

def err_pp_invalid_directive : Error<"invalid preprocessing directive">; def err_pp_invalid_directive : Error<

"invalid preprocessing directive%select{|, did you mean '#%1'?}0"

aaron.ballmanUnsubmitted

Not Done

def err_pp_invalid_directive : Error<

- "invalid preprocessing directive%select{|, did you mean '#%1'?}0"

- >;

+ "invalid preprocessing directive%select{|, did you mean '#%1'?}0">;

def warn_pp_invalid_directive : Warning<

aaron.ballman:

def warn_pp_invalid_directive : Warning<

err_pp_invalid_directive.Text>, InGroup<DiagGroup<"unknown-directives">>;

def err_pp_directive_required : Error< def err_pp_directive_required : Error<

"%0 must be used within a preprocessing directive">; "%0 must be used within a preprocessing directive">;

def err_pp_file_not_found : Error<"'%0' file not found">, DefaultFatal; def err_pp_file_not_found : Error<"'%0' file not found">, DefaultFatal;

def err_pp_through_header_not_found : Error< def err_pp_through_header_not_found : Error<

"'%0' required for precompiled header not found">, DefaultFatal; "'%0' required for precompiled header not found">, DefaultFatal;

def err_pp_through_header_not_seen : Error< def err_pp_through_header_not_seen : Error<

"#include of '%0' not seen while attempting to " "#include of '%0' not seen while attempting to "

"%select{create|use}1 precompiled header">, DefaultFatal; "%select{create|use}1 precompiled header">, DefaultFatal;

▲ Show 20 Lines • Show All 450 Lines • Show Last 20 Lines

clang/include/clang/Lex/Preprocessor.h

Show First 20 Lines • Show All 2,235 Lines • ▼ Show 20 Lines	MacroInfo *ReadOptionalMacroParameterListAndBody(
const Token &MacroNameTok, bool ImmediatelyAfterHeaderGuard);		const Token &MacroNameTok, bool ImmediatelyAfterHeaderGuard);

/// The ( starting an argument list of a macro definition has just been read.		/// The ( starting an argument list of a macro definition has just been read.
/// Lex the rest of the parameters and the closing ), updating \p MI with		/// Lex the rest of the parameters and the closing ), updating \p MI with
/// what we learn and saving in \p LastTok the last token read.		/// what we learn and saving in \p LastTok the last token read.
/// Return true if an error occurs parsing the arg list.		/// Return true if an error occurs parsing the arg list.
bool ReadMacroParameterList(MacroInfo *MI, Token& LastTok);		bool ReadMacroParameterList(MacroInfo *MI, Token& LastTok);

		/// Provide a suggestion for a typoed directive. If there is no typo, then
		/// just skip suggesting.
		void SuggestTypoedDirective(const Token &Tok, StringRef Directive,
		const SourceLocation &endLoc);

/// We just read a \#if or related directive and decided that the		/// We just read a \#if or related directive and decided that the
/// subsequent tokens are in the \#if'd out portion of the		/// subsequent tokens are in the \#if'd out portion of the
/// file. Lex the rest of the file, until we see an \#endif. If \p		/// file. Lex the rest of the file, until we see an \#endif. If \p
/// FoundNonSkipPortion is true, then we have already emitted code for part of		/// FoundNonSkipPortion is true, then we have already emitted code for part of
/// this \#if directive, so \#else/\#elif blocks should never be entered. If		/// this \#if directive, so \#else/\#elif blocks should never be entered. If
/// \p FoundElse is false, then \#else directives are ok, if not, then we have		/// \p FoundElse is false, then \#else directives are ok, if not, then we have
/// already seen one so a \#else directive is a duplicate. When this returns,		/// already seen one so a \#else directive is a duplicate. When this returns,
/// the caller can lex the first valid token.		/// the caller can lex the first valid token.
▲ Show 20 Lines • Show All 372 Lines • Show Last 20 Lines

clang/lib/Lex/PPDirectives.cpp

Show First 20 Lines • Show All 427 Lines • ▼ Show 20 Lines assert(CurLexerBufferOffset >= HashFileOffset.second &&

"lexer is before the hash?"); "lexer is before the hash?");

// Take into account the fact that the lexer has already advanced, so the // Take into account the fact that the lexer has already advanced, so the

// number of bytes to skip must be adjusted. // number of bytes to skip must be adjusted.

unsigned LengthDiff = CurLexerBufferOffset - HashFileOffset.second; unsigned LengthDiff = CurLexerBufferOffset - HashFileOffset.second;

assert(BytesToSkip >= LengthDiff && "lexer is after the skipped range?"); assert(BytesToSkip >= LengthDiff && "lexer is after the skipped range?");

return BytesToSkip - LengthDiff; return BytesToSkip - LengthDiff;

} }

/// SuggestTypoedDirective - Provide a suggestion for a typoed directive. If

/// there is no typo, then just skip suggesting.

void Preprocessor::SuggestTypoedDirective(const Token &Tok, StringRef Directive,

const SourceLocation &EndLoc) {

std::vector<StringRef> Candidates = {

"if", "ifdef", "ifndef", "elif", "else", "endif"

erichkeaneUnsubmitted

Not Done

Should this be dependent on C modes? it would be kind of silly to suggest "invalid preprocessing token elifndef, did you mean elifndef"?

erichkeane: Should this be dependent on C modes? it would be kind of silly to suggest "invalid…

aaron.ballmanUnsubmitted

Not Done

Both C and C++ have the new directives and we enable them as an extension in all language modes, so no need to care about it being in C mode specifically. However, I think we do need to care about *which* C and C++ mode we're in so that we only suggest the new directives in C2x and C++2b mode.

aaron.ballman: Both C and C++ have the new directives and we enable them as an extension in all language modes…

};

if (LangOpts.C2x || LangOpts.CPlusPlus2b) {

Candidates.insert(Candidates.end(), {"elifdef", "elifndef"});

erichkeaneUnsubmitted

Not Done

I don't believe this meets our requirements for 'auto'.

erichkeane: I don't believe this meets our requirements for 'auto'.

ken-matsuiAuthorUnsubmitted

Done

Thank you. I replaced it with the actual type.

ken-matsui: Thank you. I replaced it with the actual type.

}

aaron.ballmanUnsubmitted

Not Done

"if", "ifdef", "ifndef", "elif", "else", "endif"

};

- if (LangOpts.C2x || LangOpts.CPlusPlus2b) {

+ if (LangOpts.C2x || LangOpts.CPlusPlus2b)

Candidates.insert(Candidates.end(), {"elifdef", "elifndef"});

- }

if (Optional<StringRef> Sugg = findSimilarStr(Directive, Candidates)) {

aaron.ballman:

if (Optional<std::string> Sugg = Directive.find_similar_str(Candidates)) {

aaron.ballmanUnsubmitted

Not Done

void Preprocessor::SuggestTypoedDirective(const Token &Tok, StringRef Directive,

- const SourceLocation &endLoc) {

- const std::array<std::string, 8> candidates{

- "if", "ifdef", "ifndef", "elif", "elifdef", "elifndef", "else", "endif"};

+ const SourceLocation &EndLoc) {

+ constexpr StringRef Candidates[] = {

+ "if", "ifdef", "ifndef", "elif", "elifdef", "elifndef", "else", "endif"

+ };

- if (const auto sugg =

+ if (auto Sugg =

tooling::levdistance::findSimilarStr(candidates, Directive)) {

CharSourceRange DirectiveRange =

- CharSourceRange::getCharRange(Tok.getLocation(), endLoc);

- const std::string suggValue = sugg.getValue();

+ CharSourceRange::getCharRange(Tok.getLocation(), EndLoc);

+ std::string SuggValue = Sugg.getValue();

auto Hint = FixItHint::CreateReplacement(DirectiveRange, "#" + suggValue);

Mostly just cleaning up for coding conventions, but also, no need to use a std::array and we typically don't use local top-level const qualification.

aaron.ballman: Mostly just cleaning up for coding conventions, but also, no need to use a `std::array` and we…

ken-matsuiAuthorUnsubmitted

Not Done

Thank you!

Just wondering, but is there any reason not to use the const qualifier?

ken-matsui: Thank you! Just wondering, but is there any reason not to use the `const` qualifier?

aaron.ballmanUnsubmitted

Done

Just wondering, but is there any reason not to use the const qualifier?

We have some quite inconsistent const correctness in our interfaces, so declaring local objects as const sometimes requires viral changes to make things compile. (I think we've mostly excised the uses of const_cast that would cast away constness in situations which weren't guaranteed to be safe in.)

aaron.ballman: > Just wondering, but is there any reason not to use the const qualifier? We have some quite…

ken-matsuiAuthorUnsubmitted

Done

Oh, I see thank you. This is a too difficult problem when working with many contributors though...

ken-matsui: Oh, I see thank you. This is a too difficult problem when working with many contributors though.

CharSourceRange DirectiveRange =

CharSourceRange::getCharRange(Tok.getLocation(), EndLoc);

std::string SuggValue = Sugg.getValue();

auto Hint = FixItHint::CreateReplacement(DirectiveRange, "#" + SuggValue);

Diag(Tok, diag::warn_pp_invalid_directive) << 1 << SuggValue << Hint;

}

/// SkipExcludedConditionalBlock - We just read a \#if or related directive and /// SkipExcludedConditionalBlock - We just read a \#if or related directive and

/// decided that the subsequent tokens are in the \#if'd out portion of the /// decided that the subsequent tokens are in the \#if'd out portion of the

/// file. Lex the rest of the file, until we see an \#endif. If /// file. Lex the rest of the file, until we see an \#endif. If

/// FoundNonSkipPortion is true, then we have already emitted code for part of /// FoundNonSkipPortion is true, then we have already emitted code for part of

/// this \#if directive, so \#else/\#elif blocks should never be entered. /// this \#if directive, so \#else/\#elif blocks should never be entered.

/// If ElseOk is true, then \#else directives are ok, if not, then we have /// If ElseOk is true, then \#else directives are ok, if not, then we have

/// already seen one so a \#else directive is a duplicate. When this returns, /// already seen one so a \#else directive is a duplicate. When this returns,

/// the caller can lex the first valid token. /// the caller can lex the first valid token.

▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines if (Directive.startswith("if")) {

Sub == "def" || // "ifdef" Sub == "def" || // "ifdef"

Sub == "ndef") { // "ifndef" Sub == "ndef") { // "ifndef"

// We know the entire #if/#ifdef/#ifndef block will be skipped, don't // We know the entire #if/#ifdef/#ifndef block will be skipped, don't

// bother parsing the condition. // bother parsing the condition.

DiscardUntilEndOfDirective(); DiscardUntilEndOfDirective();

CurPPLexer->pushConditionalLevel(Tok.getLocation(), /*wasskipping*/true, CurPPLexer->pushConditionalLevel(Tok.getLocation(), /*wasskipping*/true,

/*foundnonskip*/false, /*foundnonskip*/false,

/*foundelse*/false); /*foundelse*/false);

} else {

SuggestTypoedDirective(Tok, Directive, endLoc);

} }

} else if (Directive[0] == 'e') { } else if (Directive[0] == 'e') {

StringRef Sub = Directive.substr(1); StringRef Sub = Directive.substr(1);

if (Sub == "ndif") { // "endif" if (Sub == "ndif") { // "endif"

PPConditionalInfo CondInfo; PPConditionalInfo CondInfo;

CondInfo.WasSkipping = true; // Silence bogus warning. CondInfo.WasSkipping = true; // Silence bogus warning.

bool InCond = CurPPLexer->popConditionalLevel(CondInfo); bool InCond = CurPPLexer->popConditionalLevel(CondInfo);

(void)InCond; // Silence warning in no-asserts mode. (void)InCond; // Silence warning in no-asserts mode.

▲ Show 20 Lines • Show All 133 Lines • ▼ Show 20 Lines if (Directive.startswith("if")) {

} }

// If this condition is true, enter it! // If this condition is true, enter it!

if (static_cast<bool>(MI) == IsElifDef) { if (static_cast<bool>(MI) == IsElifDef) {

CondInfo.FoundNonSkip = true; CondInfo.FoundNonSkip = true;

break; break;

} }

} else {

SuggestTypoedDirective(Tok, Directive, endLoc);

} }

} else {

SuggestTypoedDirective(Tok, Directive, endLoc);

} }

CurPPLexer->ParsingPreprocessorDirective = false; CurPPLexer->ParsingPreprocessorDirective = false;

// Restore comment saving mode. // Restore comment saving mode.

if (CurLexer) CurLexer->resetExtendedTokenMode(); if (CurLexer) CurLexer->resetExtendedTokenMode();

} }

// Finally, if we are out of the conditional (saw an #endif or ran off the end // Finally, if we are out of the conditional (saw an #endif or ran off the end

▲ Show 20 Lines • Show All 460 Lines • ▼ Show 20 Lines if (getLangOpts().AsmPreprocessor) {

// Enter this token stream so that we re-lex the tokens. Make sure to // Enter this token stream so that we re-lex the tokens. Make sure to

// enable macro expansion, in case the token after the # is an identifier // enable macro expansion, in case the token after the # is an identifier

// that is expanded. // that is expanded.

EnterTokenStream(std::move(Toks), 2, false, /*IsReinject*/false); EnterTokenStream(std::move(Toks), 2, false, /*IsReinject*/false);

return; return;

} }

// If we reached here, the preprocessing token is not valid! // If we reached here, the preprocessing token is not valid!

Diag(Result, diag::err_pp_invalid_directive); Diag(Result, diag::err_pp_invalid_directive) << 0;

aaron.ballmanUnsubmitted

Not Done

I think we should be attempting to suggest a typo for the error case as well e.g.,

#fi WHATEVER
#endif

we currently give no suggestion for that typo, just the error. However, this may require a fair amount of changes because of the various edge cases where we give better diagnostics than "unknown directive". e.g.,

#if WHATEVER // error: unterminated conditional directive
#endfi // no diagnostic

so if it looks like covering error cases is going to be involved, I'm fine doing it in a follow-up if you'd prefer.

aaron.ballman: I think we should be attempting to suggest a typo for the error case as well e.g., ``` #fi…

ken-matsuiAuthorUnsubmitted

Not Done

The former can be implemented easily, but the latter seems not easy.
So what about doing the latter in another patch?

ken-matsui: The former can be implemented easily, but the latter seems not easy. So what about doing the…

aaron.ballmanUnsubmitted

Not Done

I'm fine doing the error cases entirely in another patch. The current approach here is a problem because it issues a warning and an error for the same line of code. So I'd go back to the way the code was before, and in a follow-up you can handle errors.

I think one approach to consider for that is having an argument to SuggestTypoedDirective() as to whether to err or warn so that we can reuse the same logic.

aaron.ballman: I'm fine doing the error cases entirely in another patch. The current approach here is a…

ken-matsuiAuthorUnsubmitted

Done

OK, reverting 👍

I'll work on it. Thank you.

ken-matsui: OK, reverting :+1: I'll work on it. Thank you.

// Read the rest of the PP line. // Read the rest of the PP line.

DiscardUntilEndOfDirective(); DiscardUntilEndOfDirective();

// Okay, we're done parsing the directive. // Okay, we're done parsing the directive.

} }

/// GetLineValue - Convert a numeric token into an unsigned value, emitting /// GetLineValue - Convert a numeric token into an unsigned value, emitting

▲ Show 20 Lines • Show All 2,117 Lines • Show Last 20 Lines

clang/test/Preprocessor/suggest-typoed-directive-c2x-cpp2b.c

This file was added.

				// RUN: %clang_cc1 -std=c2x -fsyntax-only -verify %s
				// RUN: %clang_cc1 -x c++ -std=c++2b -fsyntax-only -verify %s

				// id: not suggested to '#if'
				// ifd: expected-warning@+11 {{invalid preprocessing directive, did you mean '#if'?}}
				// ifde: expected-warning@+11 {{invalid preprocessing directive, did you mean '#ifdef'?}}
				// elify: expected-warning@+11 {{invalid preprocessing directive, did you mean '#elif'?}}
				// elsif: expected-warning@+11 {{invalid preprocessing directive, did you mean '#elif'?}}
				// elseif: expected-warning@+11 {{invalid preprocessing directive, did you mean '#elif'?}}
				// elfidef: expected-warning@+11 {{invalid preprocessing directive, did you mean '#elifdef'?}}
				// elfindef: expected-warning@+11 {{invalid preprocessing directive, did you mean '#elifndef'?}}
				// elses: expected-warning@+11 {{invalid preprocessing directive, did you mean '#else'?}}
				// endi: expected-warning@+11 {{invalid preprocessing directive, did you mean '#endif'?}}
				#ifdef UNDEFINED
				#id
				#ifd
				#ifde
				#elify
				#elsif
				#elseif
				#elfidef
				#elfindef
				#elses
				#endi
				#endif

				#if special_compiler
				#special_compiler_directive // no diagnostic
				#endif

clang/test/Preprocessor/suggest-typoed-directive.c

This file was added.

				// RUN: %clang_cc1 -fsyntax-only -verify %s

				aaron.ballmanUnsubmitted Not Done Reply Inline Actions I'd drop the `-pedantic` from here, then we don't need the expected warning on the last line. I think you should drop the `-std=c99` as well so there's a RUN line that's not tied to a specific release, then I'd add a RUN line explicitly for c2x. aaron.ballman: I'd drop the `-pedantic` from here, then we don't need the expected warning on the last line.
				// id: not suggested to '#if'
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions This still suggests that something's wrong as I would imagine this would have an edit distance of 1. Oh, interesting... setting the replacement option to `false` may have made things better for the `elfindef` case but worse for the `id` case? This is tricky because we want to identify things that are most likely simple typos but exclude things that may reasonably not be a typo but a custom preprocessor directive. Based on that, I think setting the replacement option to `true` gives the more conservative answer (it treats a replacement as 1 edit rather than 2). @erichkeane -- do you have thoughts? aaron.ballman: This still suggests that something's wrong as I would imagine this would have an edit distance…
				erichkeaneUnsubmitted Not Done Reply Inline Actions Not particularly. I don't have a good hold of how much we want to suggest with the 'did you mean'. Line 9 and line 10 here are unfortunate, I would hope those would happen? Its unfortunate we don't have a way to figure out these common typos. erichkeane: Not particularly. I don't have a good hold of how much we want to suggest with the 'did you…
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions Yeah, ideally I want this to "do what I'm thinking" and make all three situations work. :-D aaron.ballman: Yeah, ideally I want this to "do what I'm thinking" and make all three situations work. :-D
				// ifd: expected-warning@+11 {{invalid preprocessing directive, did you mean '#if'?}}
				// ifde: expected-warning@+11 {{invalid preprocessing directive, did you mean '#ifdef'?}}
				// elify: expected-warning@+11 {{invalid preprocessing directive, did you mean '#elif'?}}
				// elsif: expected-warning@+11 {{invalid preprocessing directive, did you mean '#elif'?}}
				// elseif: expected-warning@+11 {{invalid preprocessing directive, did you mean '#elif'?}}
				// elfidef: not suggested to '#elifdef'
				// elfindef: not suggested to '#elifndef'
				// elses: expected-warning@+11 {{invalid preprocessing directive, did you mean '#else'?}}
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions It's interesting that this one suggested `#elifdef` instead of `#elifndef` -- is there anything that can be done for that? Also, one somewhat interesting question is whether we want to recommend `#elifdef` and `#elifndef` outside of C2x/C++2b mode. Those directives only exist in the latest language standard, but Clang supports them as a conforming extension in all language modes. Given that this diagnostic is about typos, I think I'm okay suggesting the directives even in older language modes. That's as likely to be a correct suggestion as not, IMO. aaron.ballman: It's interesting that this one suggested `#elifdef` instead of `#elifndef` -- is there anything…
				ken-matsuiAuthorUnsubmitted Not Done Reply Inline Actions It's interesting that this one suggested `#elifdef` instead of `#elifndef` -- is there anything that can be done for that? I found I have to use `std::min_element` instead of `std::max_element` because we are finding the nearest (most minimum distance) string. Meanwhile, `#elfindef` has 2 distance with both `#elifdef` and `#elifndef`. After replacing `std::max_element` with `std::min_element`, I could suggest `#elifndef` from `#elfinndef`. Also, one somewhat interesting question is whether we want to recommend `#elifdef` and `#elifndef` outside of C2x/C++2b mode. Those directives only exist in the latest language standard, but Clang supports them as a conforming extension in all language modes. Given that this diagnostic is about typos, I think I'm okay suggesting the directives even in older language modes. That's as likely to be a correct suggestion as not, IMO. I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like `this is a C2x/C++2b feature but compilable on Clang`?) According to the algorithm of `std::min_element`, we only get an iterator of the first element even if there is another element that has the same distance. So, `#elfindef` only suggests `#elifdef` in accordance with the order of `Candidates`, and I don't think it is beautiful to depend on the order of candidates. I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable: #elfindef // diag: unknown directive, did you mean #elifdef or #elifndef? ken-matsui: > It's interesting that this one suggested `#elifdef` instead of `#elifndef` -- is there…
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like this is a C2x/C++2b feature but compilable on Clang?) I may be changing my mind on this a bit. I now see we don't issue an extension warning when using `#elifdef` or `#elifndef` in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306) I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable: The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g., #elfindef // expected-warning {{invalid preprocessing directive}} \ expected-note {{did you mean '#elifdef'?}} \ expected-note {{did you mean '#elifndef'?}} WDYT? aaron.ballman: > I agree with you because Clang implements those directives, and the suggested code will also…
				ken-matsuiAuthorUnsubmitted Not Done Reply Inline Actions I may be changing my mind on this a bit. I now see we don't issue an extension warning when using `#elifdef` or `#elifndef` in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306) Certainly, it would be less confusing to the user to avoid suggestions regarding those two. I'm going to fix my code to avoid suggesting them in not C2x mode. Thank you for submitting the issue, I also would like to work on it. The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g., #elfindef // expected-warning {{invalid preprocessing directive}} \ expected-note {{did you mean '#elifdef'?}} \ expected-note {{did you mean '#elifndef'?}} WDYT? This is really cool, but don't you care about the redundancy of `did you mean` in terms of the llvm team culture? If not, I will implement notes like the above. ken-matsui: > I may be changing my mind on this a bit. I now see we don't issue an extension warning when…
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions Certainly, it would be less confusing to the user to avoid suggestions regarding those two. I'm going to fix my code to avoid suggesting them in not C2x mode. +1, thank you! This is really cool, but don't you care about the redundancy of did you mean in terms of the llvm team culture? If not, I will implement notes like the above. I would care if the list were potentially unbounded (like, say, with identifiers in general), but because we know this list will only have a max of two entries on it in this case, it seems reasonable to me. I double-checked with @erichkeane to see if he thought it would be an issue, and he agreed that it being a fixed list makes it pretty reasonable. However, that discussion did raise a question -- why are there two suggestions? elifdef requires a swap + delete and elifndef requires just a swap, so we would have thought that it would have been the only option in the list. aaron.ballman: > Certainly, it would be less confusing to the user to avoid suggestions regarding those two.
				ken-matsuiAuthorUnsubmitted Not Done Reply Inline Actions With the implementation of Lev distances used in llvm, I could simply suggest `#elifdef` from `#elfidef` and `#elifndef` from `#elfindef`. So, in this situation, do you think that we still need to add two notes for conflicted distances? ken-matsui: With the implementation of Lev distances used in llvm, I could simply suggest `#elifdef` from…
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions No, let's skip the two note behavior. If we find ourselves with multiple suggestions, we'll just leave off the "did you mean?" part of the diagnostic entirely. aaron.ballman: No, let's skip the two note behavior. If we find ourselves with multiple suggestions, we'll…
				erichkeaneUnsubmitted Not Done Reply Inline Actions Might I suggest we just emit the 'first' suggestion? This isn't really perfect, but I would think that our users put very little thought into it when we suggest the wrong thing. erichkeane: Might I suggest we just emit the 'first' suggestion? This isn't really perfect, but I would…
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions Notionally that makes sense, but the situation I'm worried about is when users pass `-fixit` to the compiler -- then it automatically does whatever, but we have no idea whether that whatever is reasonable or not when we have more than one option. aaron.ballman: Notionally that makes sense, but the situation I'm worried about is when users pass `-fixit` to…
				ken-matsuiAuthorUnsubmitted Not Done Reply Inline Actions I could suggest `#if` from `#id`, but I had to change `#elfindef` to `#elfinndef` to suggest `#elifndef`. After setting the `AllowReplacements` option to `true`, `#elfindef` suggested `#elifdef` (I think it might have the same distance with `#elifndef`). ken-matsui: I could suggest `#if` from `#id`, but I had to change `#elfindef` to `#elfinndef` to suggest…
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions Thanks! I'm coming around more to the idea that we'll never get the "perfect" set of suggestions, and so whatever suggestions we get are quite likely good enough. However, I think it would be helpful to also add the tests where the behavior seems a bit strange, like `#elfindef`, just to show that we know the behavior is what it is and it doesn't do bad things like crash or whatnot. aaron.ballman: Thanks! I'm coming around more to the idea that we'll never get the "perfect" set of…
				ken-matsuiAuthorUnsubmitted Done Reply Inline Actions Thank you :) Ok, I'll add the test as well. ken-matsui: Thank you :) Ok, I'll add the test as well.
				// endi: expected-warning@+11 {{invalid preprocessing directive, did you mean '#endif'?}}
				#ifdef UNDEFINED
				#id
				#ifd
				#ifde
				#elify
				#elsif
				#elseif
				#elfidef
				#elfindef
				#elses
				#endi
				#endif

				#if special_compiler
				#special_compiler_directive // no diagnostic
				#endif
				ken-matsuiAuthorUnsubmitted Done Reply Inline Actions Here, `#elfindef` is suggested to `#elifdef`, not `#elifndef`, but I believe it will help many users to realize that they have typo'd `#elifndef`, or maybe they might have meant actually `#elifdef`. ken-matsui: Here, `#elfindef` is suggested to `#elifdef`, not `#elifndef`, but I believe it will help many…
				aaron.ballmanUnsubmitted Not Done Reply Inline Actions +1 aaron.ballman: +1

llvm/include/llvm/ADT/StringRef.h

//===- StringRef.h - Constant String Reference Wrapper ----------- C++ --===//		//===- StringRef.h - Constant String Reference Wrapper ----------- C++ --===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

#ifndef LLVM_ADT_STRINGREF_H		#ifndef LLVM_ADT_STRINGREF_H
#define LLVM_ADT_STRINGREF_H		#define LLVM_ADT_STRINGREF_H

#include "llvm/ADT/DenseMapInfo.h"		#include "llvm/ADT/DenseMapInfo.h"
		#include "llvm/ADT/Optional.h"
#include "llvm/ADT/STLFunctionalExtras.h"		#include "llvm/ADT/STLFunctionalExtras.h"
#include "llvm/ADT/iterator_range.h"		#include "llvm/ADT/iterator_range.h"
#include "llvm/Support/Compiler.h"		#include "llvm/Support/Compiler.h"
#include <algorithm>		#include <algorithm>
#include <cassert>		#include <cassert>
#include <cstddef>		#include <cstddef>
#include <cstring>		#include <cstring>
#include <limits>		#include <limits>
#include <string>		#include <string>
#if __cplusplus > 201402L		#if __cplusplus > 201402L
#include <string_view>		#include <string_view>
#endif		#endif
#include <type_traits>		#include <type_traits>
#include <utility>		#include <utility>
		#include <vector>

// Declare the __builtin_strlen intrinsic for MSVC so it can be used in		// Declare the __builtin_strlen intrinsic for MSVC so it can be used in
// constexpr context.		// constexpr context.
#if defined(_MSC_VER)		#if defined(_MSC_VER)
extern "C" size_t __builtin_strlen(const char *);		extern "C" size_t __builtin_strlen(const char *);
#endif		#endif

namespace llvm {		namespace llvm {
▲ Show 20 Lines • Show All 200 Lines • ▼ Show 20 Lines	#endif
/// \returns the minimum number of character insertions, removals,		/// \returns the minimum number of character insertions, removals,
/// or (if \p AllowReplacements is \c true) replacements needed to		/// or (if \p AllowReplacements is \c true) replacements needed to
/// transform one of the given strings into the other. If zero,		/// transform one of the given strings into the other. If zero,
/// the strings are identical.		/// the strings are identical.
LLVM_NODISCARD		LLVM_NODISCARD
unsigned edit_distance(StringRef Other, bool AllowReplacements = true,		unsigned edit_distance(StringRef Other, bool AllowReplacements = true,
unsigned MaxEditDistance = 0) const;		unsigned MaxEditDistance = 0) const;

		/// Find a similar string in `Candidates`.
		///
		/// \param Candidates the candidates to find a similar string.
		///
		/// \param Dist Maximum distance to elect as similar strings. Eventually,
		/// a string with the smallest distance (= most similar string) among them
		/// will be returned.
		/// If non-zero, use it as maximum distance; otherwise, if the LHS size is
		/// less than 3, use the LHS size minus 1 and if not, use the LHS size
		/// divided by 3.
		///
		/// \returns a similar string if exists. If no similar string exists,
		/// returns None.
		Optional<std::string> find_similar_str(
		const std::vector<StringRef> &Candidates,
		size_t Dist = 0) const;

/// str - Get the contents as an std::string.		/// str - Get the contents as an std::string.
LLVM_NODISCARD		LLVM_NODISCARD
std::string str() const {		std::string str() const {
if (!Data) return std::string();		if (!Data) return std::string();
return std::string(Data, Length);		return std::string(Data, Length);
}		}
		ken-matsuiAuthorUnsubmitted Not Done Reply Inline Actions It seems my previous comment that I posted on deleted file disappeared. Could you please check the following link comment? https://reviews.llvm.org/D124726#3493222 ken-matsui: It seems my previous comment that I posted on deleted file disappeared. Could you please check…
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions Thank you for calling this out, I had missed it! I'll respond here. Should I keep going with this implementation or not? I don't think we should add another implementation of Levenshtein distances; we should stick with the base functionality, which is what's used by the existing typo correction logic: https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaLookup.cpp#L4308 Ideally (but I'm not asking you to do this because it's likely a very big ask), we'd generalize the `TypoCorrectionConsumer` and related classes so that it can be used during Lex, Parse, or Sema with callbacks from the typo correction consumer to obtain the list of potential names for correction. However, that functionality is so tightly tied to Sema, it may not be particularly plausible. One thing I noticed is that your implementation does not allow replacements, while the typo correction does. I think that likely explains the behavioral difference you're seeing between implementations. aaron.ballman: Thank you for calling this out, I had missed it! I'll respond here. > Should I keep going with…
		ken-matsuiAuthorUnsubmitted Not Done Reply Inline Actions Thank you. I updated the test to fit the existing implementation. The suggestion for `#id` couldn't be suggested as `#if`. ken-matsui: Thank you. I updated the test to fit the existing implementation. The suggestion for `#id`…

/// @}		/// @}
/// @name Operator Overloads		/// @name Operator Overloads
/// @{		/// @{

LLVM_NODISCARD		LLVM_NODISCARD
char operator[](size_t Index) const {		char operator[](size_t Index) const {
assert(Index < Length && "Invalid index!");		assert(Index < Length && "Invalid index!");
▲ Show 20 Lines • Show All 739 Lines • Show Last 20 Lines

llvm/lib/Support/StringRef.cpp

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	unsigned StringRef::edit_distance(llvm::StringRef Other,
bool AllowReplacements,		bool AllowReplacements,
unsigned MaxEditDistance) const {		unsigned MaxEditDistance) const {
return llvm::ComputeEditDistance(		return llvm::ComputeEditDistance(
makeArrayRef(data(), size()),		makeArrayRef(data(), size()),
makeArrayRef(Other.data(), Other.size()),		makeArrayRef(Other.data(), Other.size()),
AllowReplacements, MaxEditDistance);		AllowReplacements, MaxEditDistance);
}		}

		// Find a similar string in `Candidates`.
		Optional<std::string> StringRef::find_similar_str(const std::vector<StringRef> &Candidates, size_t Dist) const {
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions I am less convinced that we want this as a general interface on `StringRef`. I think callers need to decide whether something is similar enough or not. For example, case sensitivity may not matter for this case but it might matter to another case. Others may wish to limit the max edit_distance for each of the candidates for performance reasons, others may not. We already saw there were questions about whether to allow replacements or not. That sort of thing. I think this functionality should be moved to PPDirectives.cpp as a static helper function that's specific to this use. Also, because this returns one of the candidates passed in, and those are a `StringRef`, I think this function should return a `StringRef` -- this removes allocation costs. I'd also drop the `Dist` parameter as the function is no longer intended to be a general purpose one. aaron.ballman: I am less convinced that we want this as a general interface on `StringRef`. I think callers…
		ken-matsuiAuthorUnsubmitted Not Done Reply Inline Actions I am also in favor of it, but where should I put tests for this functionality? Is it possible to write unit tests for static functions implemented in `.cpp` files? ken-matsui: I am also in favor of it, but where should I put tests for this functionality? Is it possible…
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions This is something we wouldn't usually add tests for directly but instead test the behavior of through lit -- the tests you already have exercise this code path and would show if there's a situation where we expect a viable candidate and don't find any. So I'd not worry about test coverage for it in this case. aaron.ballman: This is something we wouldn't usually add tests for directly but instead test the behavior of…
		ken-matsuiAuthorUnsubmitted Done Reply Inline Actions I see. Thank you! ken-matsui: I see. Thank you!
		// We need to check if `rng` has the exact case-insensitive string because the Levenshtein distance match does not
		// care about it.
		for (StringRef C : Candidates) {
		if (equals_insensitive(C)) {
		return C.str();
		}
		}

		// Keep going with the Levenshtein distance match.
		// If dist is given, use the dist for maxDist; otherwise, if the LHS size is less than 3, use the LHS size minus 1
		// and if not, use the LHS size divided by 3.
		size_t MaxDist = Dist != 0 ? Dist
		: Length < 3 ? Length - 1
		: Length / 3;

		std::vector<std::pair<std::string, size_t>> Cand;
		for (StringRef C : Candidates) {
		size_t CurDist = edit_distance(C, false);
		if (CurDist <= MaxDist) {
		Cand.emplace_back(C, CurDist);
		}
		}

		if (Cand.empty()) {
		return None;
		} else if (Cand.size() == 1) {
		return Cand[0].first;
		} else {
		auto SimilarStr = std::min_element(
		Cand.cbegin(), Cand.cend(),
		[](const auto &A, const auto &B) { return A.second < B.second; });
		return SimilarStr->first;
		}
		aaron.ballmanUnsubmitted Not Done Reply Inline Actions I think we can skip the vector entirely by keeping track of the min edit distance we've seen thus far and which candidate that corresponds to when looping over the candidates. aaron.ballman: I think we can skip the vector entirely by keeping track of the min edit distance we've seen…
		ken-matsuiAuthorUnsubmitted Done Reply Inline Actions 👍 ken-matsui: :+1:
		}

//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
// String Operations		// String Operations
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//

std::string StringRef::lower() const {		std::string StringRef::lower() const {
return std::string(map_iterator(begin(), toLower),		return std::string(map_iterator(begin(), toLower),
map_iterator(end(), toLower));		map_iterator(end(), toLower));
}		}
▲ Show 20 Lines • Show All 499 Lines • Show Last 20 Lines

llvm/unittests/ADT/StringRefTest.cpp

Show First 20 Lines • Show All 560 Lines • ▼ Show 20 Lines	TEST(StringRefTest, Count) {
EXPECT_EQ(1U, OverlappingAbba.count("abba"));		EXPECT_EQ(1U, OverlappingAbba.count("abba"));
StringRef NonOverlappingAbba("abbaabba");		StringRef NonOverlappingAbba("abbaabba");
EXPECT_EQ(2U, NonOverlappingAbba.count("abba"));		EXPECT_EQ(2U, NonOverlappingAbba.count("abba"));
StringRef ComplexAbba("abbabbaxyzabbaxyz");		StringRef ComplexAbba("abbabbaxyzabbaxyz");
EXPECT_EQ(2U, ComplexAbba.count("abba"));		EXPECT_EQ(2U, ComplexAbba.count("abba"));
}		}

TEST(StringRefTest, EditDistance) {		TEST(StringRefTest, EditDistance) {
		EXPECT_EQ(0U, StringRef("aaaa").edit_distance("aaaa"));
		EXPECT_EQ(1U, StringRef("aaaa").edit_distance("aaab"));
		EXPECT_EQ(2U, StringRef("aabc").edit_distance("aacb"));
		EXPECT_EQ(2U, StringRef("aabc").edit_distance("abca"));
		EXPECT_EQ(3U, StringRef("aabc").edit_distance("adef"));
		EXPECT_EQ(4U, StringRef("abcd").edit_distance("efgh"));

StringRef Hello("hello");		StringRef Hello("hello");
EXPECT_EQ(2U, Hello.edit_distance("hill"));		EXPECT_EQ(2U, Hello.edit_distance("hill"));

StringRef Industry("industry");		StringRef Industry("industry");
EXPECT_EQ(6U, Industry.edit_distance("interest"));		EXPECT_EQ(6U, Industry.edit_distance("interest"));

StringRef Soylent("soylent green is people");		StringRef Soylent("soylent green is people");
EXPECT_EQ(19U, Soylent.edit_distance("people soiled our green"));		EXPECT_EQ(19U, Soylent.edit_distance("people soiled our green"));
EXPECT_EQ(26U, Soylent.edit_distance("people soiled our green",		EXPECT_EQ(26U, Soylent.edit_distance("people soiled our green",
/* allow replacements = */ false));		/* allow replacements = */ false));
EXPECT_EQ(9U, Soylent.edit_distance("people soiled our green",		EXPECT_EQ(9U, Soylent.edit_distance("people soiled our green",
/* allow replacements = */ true,		/* allow replacements = */ true,
/* max edit distance = */ 8));		/* max edit distance = */ 8));
EXPECT_EQ(53U, Soylent.edit_distance("people soiled our green "		EXPECT_EQ(53U, Soylent.edit_distance("people soiled our green "
"people soiled our green "		"people soiled our green "
"people soiled our green "));		"people soiled our green "));
}		}

		TEST(StringRefTest, FindSimilarStr) {
		{
		std::vector<StringRef> Candidates{"b", "c"};
		EXPECT_EQ(None, StringRef("a").find_similar_str(Candidates));
		}
		{ // macros
		std::vector<StringRef> Candidates{
		"if", "ifdef", "ifndef", "elif", "elifdef", "elifndef", "else", "endif"
		};
		EXPECT_EQ(std::string("elifdef"), StringRef("elfidef").find_similar_str(Candidates));
		EXPECT_EQ(std::string("elifdef"), StringRef("elifdef").find_similar_str(Candidates));
		EXPECT_EQ(None, StringRef("special_compiler_directive").find_similar_str(Candidates));
		}
		{ // case-insensitive
		std::vector<StringRef> Candidates{
		"if", "ifdef", "ifndef", "elif", "elifdef", "elifndef", "else", "endif"
		};
		EXPECT_EQ(std::string("elifdef"), StringRef("elifdef").find_similar_str(Candidates));
		EXPECT_EQ(std::string("elifdef"), StringRef("ELIFDEF").find_similar_str(Candidates));
		}
		}

TEST(StringRefTest, Misc) {		TEST(StringRefTest, Misc) {
std::string Storage;		std::string Storage;
raw_string_ostream OS(Storage);		raw_string_ostream OS(Storage);
OS << StringRef("hello");		OS << StringRef("hello");
EXPECT_EQ("hello", OS.str());		EXPECT_EQ("hello", OS.str());
}		}

TEST(StringRefTest, Hashing) {		TEST(StringRefTest, Hashing) {
▲ Show 20 Lines • Show All 551 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Suggest typoed directives in preprocessor conditionalsClosedPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 428395

clang/include/clang/Basic/DiagnosticLexKinds.td

clang/include/clang/Lex/Preprocessor.h

clang/lib/Lex/PPDirectives.cpp

clang/test/Preprocessor/suggest-typoed-directive-c2x-cpp2b.c

clang/test/Preprocessor/suggest-typoed-directive.c

llvm/include/llvm/ADT/StringRef.h

llvm/lib/Support/StringRef.cpp

llvm/unittests/ADT/StringRefTest.cpp

Suggest typoed directives in preprocessor conditionals
ClosedPublic