This is an archive of the discontinued LLVM Phabricator instance.

Suggest typoed directives in preprocessor conditionals
ClosedPublic

Authored by ken-matsui on Apr 30 2022, 8:34 PM.

Details

Summary

This patch implements suggestion for typoed directives in preprocessor conditionals. The related issue is: https://github.com/llvm/llvm-project/issues/51598.

Diff Detail

Event Timeline

ken-matsui created this revision.Apr 30 2022, 8:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2022, 8:34 PM
Herald added a subscriber: mgorny. · View Herald Transcript
ken-matsui requested review of this revision.Apr 30 2022, 8:34 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 30 2022, 8:34 PM
Herald added a subscriber: cfe-commits. · View Herald Transcript
ken-matsui added inline comments.Apr 30 2022, 8:50 PM
clang/test/OpenMP/predefined_macro.c
7 ↗(On Diff #426262)

I am not sure if this typo was intended.

When I renamed elsif to elif, #error "_OPENMP has incorrect value" on line 13 was evaluated.

Therefore, if this typo was intended, just suppressing the warning with expected-warning would be better. However, if this typo was NOT intended, I think I should make _OPENMP equal to 201107. It is also possible that this test was mistakenly written.

Thank you for working on this!

clang/include/clang/Basic/DiagnosticGroups.td
1023–1024 ↗(On Diff #426262)

We don't typically use "typo" in the user-facing part of diagnostics and this group doesn't seem likely to be reused, so I would remove it (another comment on this elsewhere).

clang/include/clang/Basic/DiagnosticLexKinds.td
362–363

Rather than adding a warning over the top of an existing error, I think we should modify err_pp_invalid_directive to have an optional "did you mean?" when we find a plausible typo to correct.

However, we do not issue that diagnostic when it's inside of a skipped conditional block, and that's what the crux of https://github.com/llvm/llvm-project/issues/51598 is about. As @rsmith pointed out in that issue (and I agree), compilers are free to support custom directives and those will validly appear inside of a conditional block that is skipped. We need to be careful not to diagnose those kinds of situations as an error. However, we still want to diagnose when the unknown directive is "sufficiently close" to another one which can control the conditional chain. e.g.,

#fi FOO // error unknown directive, did you mean #if?
#endfi // error unknown directive, did you mean #endif?

#if FOO
#esle // diag: unknown directive, did you mean #else?
#elfi // diag: unknown directive, did you mean #elif?
#elfidef // diag: unknown directive, did you mean #elifdef
#elinfdef // diag: unknown directive, did you mean #elifndef

#frobble // No diagnostic, not close enough to a conditional directive to warrant diagnosing
#eerp // No diagnostic, not close enough to a conditional directive to warrant diagnosing

#endif

Today, if FOO is defined to a nonzero value, you'll get diagnostics for all of those, but if FOO is not defined or is defined to 0, then there's no diagnostics. I think we want to consider directives that are *very likely* to be a typo (edit distance of 1, maybe 2?) for a conditional directive as a special case.

Currently, we only diagnose unknown directives as an error. I think for these special cased conditional directive diagnostics, we'll want to use a warning rather than an error in this circumstance (just in case it turns out to be a valid directive in a skipped conditional block). If we do go that route and make it a warning, I think the warning group should be -Wunknown-directives to mirror -Wunknown-pragmas, -Wunknown-attributes, etc and it should be defined to have the same text as the error case. e.g.,

def err_pp_invalid_directive : Error<
  "invalid preprocessing directive%select{|; did you mean '#%1'?}0"
>;
def warn_pp_invalid_directive : Warning<
  err_pp_invalid_directive.Text>, InGroup<DiagGroup<"unknown-directives">>;

WDYT?

(These were my thoughts before seeing the rest of the patch. After reading the patch, it looks like we have pretty similar ideas here, which is great, but leaving the comment anyway in case you have further opinions.)

clang/include/clang/Tooling/LevDistance.h
1 ↗(On Diff #426262)
clang/lib/Lex/PPDirectives.cpp
484–492

Mostly just cleaning up for coding conventions, but also, no need to use a std::array and we typically don't use local top-level const qualification.

clang/test/OpenMP/predefined_macro.c
7 ↗(On Diff #426262)

I tracked down the root cause here and fixed the bug in 50b51b1860acbfb775d5e2eee3310e25c635d667, so when you rebase on main you won't have to deal with this any longer. Good catch!

clang/test/Preprocessor/suggest-typoed-directive.c
2

I'd drop the -pedantic from here, then we don't need the expected warning on the last line.

I think you should drop the -std=c99 as well so there's a RUN line that's not tied to a specific release, then I'd add a RUN line explicitly for c2x.

11

It's interesting that this one suggested #elifdef instead of #elifndef -- is there anything that can be done for that?

Also, one somewhat interesting question is whether we want to recommend #elifdef and #elifndef outside of C2x/C++2b mode. Those directives only exist in the latest language standard, but Clang supports them as a conforming extension in all language modes. Given that this diagnostic is about typos, I think I'm okay suggesting the directives even in older language modes. That's as likely to be a correct suggestion as not, IMO.

Thank you so much for your clear review!

clang/include/clang/Basic/DiagnosticGroups.td
1023–1024 ↗(On Diff #426262)

Ah, I see. Thank you!

clang/include/clang/Basic/DiagnosticLexKinds.td
362–363

Currently, we only diagnose unknown directives as an error. I think for these special cased conditional directive diagnostics, we'll want to use a warning rather than an error in this circumstance (just in case it turns out to be a valid directive in a skipped conditional block). If we do go that route and make it a warning, I think the warning group should be -Wunknown-directives to mirror -Wunknown-pragmas, -Wunknown-attributes, etc and it should be defined to have the same text as the error case. e.g.,

def err_pp_invalid_directive : Error<
  "invalid preprocessing directive%select{|; did you mean '#%1'?}0"
>;
def warn_pp_invalid_directive : Warning<
  err_pp_invalid_directive.Text>, InGroup<DiagGroup<"unknown-directives">>;

WDYT?

(These were my thoughts before seeing the rest of the patch. After reading the patch, it looks like we have pretty similar ideas here, which is great, but leaving the comment anyway in case you have further opinions.)

For now, I totally agree with deriving a new warning from err_pp_invalid_directive.

However, for future scalability, I think it would be great if we could split those diagnostics into Error & Warning and Help, for example. Rustc does split the diagnostics like the following, and I think this is quite clear. So, a bit apart from this patch, I speculate creating a diagnostic system that can split them would bring Clang diagnostics much more readable.

https://github.com/rust-lang/rust/blob/598d89bf142823b5d84e2eb0f0f9e418ee966a4b/src/test/ui/suggestions/suggest-trait-items.stderr

clang/include/clang/Tooling/LevDistance.h
1 ↗(On Diff #426262)

I totally missed the implementations! Sorry.

But where should I put both findSimilarStr & findSimilarStr?
It seems that their same implementations aren't there.

clang/lib/Lex/PPDirectives.cpp
484–492

Thank you!

Just wondering, but is there any reason not to use the const qualifier?

clang/test/OpenMP/predefined_macro.c
7 ↗(On Diff #426262)

Thank you for fixing the bug! I could confirm the bug was fixed upstream.

clang/test/Preprocessor/suggest-typoed-directive.c
11

It's interesting that this one suggested #elifdef instead of #elifndef -- is there anything that can be done for that?

I found I have to use std::min_element instead of std::max_element because we are finding the nearest (most minimum distance) string. Meanwhile, #elfindef has 2 distance with both #elifdef and #elifndef. After replacing std::max_element with std::min_element, I could suggest #elifndef from #elfinndef.

Also, one somewhat interesting question is whether we want to recommend #elifdef and #elifndef outside of C2x/C++2b mode. Those directives only exist in the latest language standard, but Clang supports them as a conforming extension in all language modes. Given that this diagnostic is about typos, I think I'm okay suggesting the directives even in older language modes. That's as likely to be a correct suggestion as not, IMO.

I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like this is a C2x/C++2b feature but compilable on Clang?)


According to the algorithm of std::min_element, we only get an iterator of the first element even if there is another element that has the same distance. So, #elfindef only suggests #elifdef in accordance with the order of Candidates, and I don't think it is beautiful to depend on the order of candidates. I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable:

#elfindef // diag: unknown directive, did you mean #elifdef or #elifndef?
ken-matsui added inline comments.May 5 2022, 1:13 AM
clang/include/clang/Tooling/LevDistance.h
1 ↗(On Diff #426262)

@aaron.ballman

The computation results are not matched with my implementation and another language implementation.

So, several directives that could be suggested in my implementation are no longer possible to be suggested, such as #id -> #if, #elid -> #elif, and #elsi -> #else, when using llvm::ComputeEditDistance.

The implementation of llvm::ComputeEditDistance might be also correct, but some distances seem to be calculated longer, which causes fewer suggestions.

Should I keep going with this implementation or not?


llvm::ComputeEditDistance:

size_t Dist = Str1.edit_distance(Str2, false);
Str1: id, Str2: if, Dist: 2 # <-- here
Str1: id, Str2: ifdef, Dist: 3
Str1: id, Str2: ifndef, Dist: 4
Str1: ifd, Str2: if, Dist: 1
Str1: ifd, Str2: ifdef, Dist: 2
Str1: ifd, Str2: ifndef, Dist: 3

My implementation:

Str1: id, Str2: if, Dist: 1 # <-- here
Str1: id, Str2: ifdef, Dist: 3
Str1: id, Str2: ifndef, Dist: 4
Str1: ifd, Str2: if, Dist: 1
Str1: ifd, Str2: ifdef, Dist: 2
Str1: ifd, Str2: ifndef, Dist: 3

Rustc implementation:

Str1: id, Str2: if, Dist: 1 # <-- here
Str1: id, Str2: ifdef, Dist: 3
Str1: id, Str2: ifndef, Dist: 4
Str1: ifd, Str2: if, Dist: 1
Str1: ifd, Str2: ifdef, Dist: 2
Str1: ifd, Str2: ifndef, Dist: 3
ken-matsui updated this revision to Diff 427262.May 5 2022, 3:52 AM

Update codes as reviewed

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 3:52 AM
ken-matsui updated this revision to Diff 427263.May 5 2022, 3:55 AM

Remove unnecessary includes

aaron.ballman added inline comments.
clang/include/clang/Basic/DiagnosticLexKinds.td
362–363

However, for future scalability, I think it would be great if we could split those diagnostics into Error & Warning and Help, for example. Rustc does split the diagnostics like the following, and I think this is quite clear. So, a bit apart from this patch, I speculate creating a diagnostic system that can split them would bring Clang diagnostics much more readable.

CC @cjdb for awareness. (He's working on a project for improving the way in which we communicate diagnostics to users.)

clang/lib/Lex/PPDirectives.cpp
484–492

Just wondering, but is there any reason not to use the const qualifier?

We have some quite inconsistent const correctness in our interfaces, so declaring local objects as const sometimes requires viral changes to make things compile. (I think we've mostly excised the uses of const_cast that would cast away constness in situations which weren't guaranteed to be safe in.)

clang/test/Preprocessor/suggest-typoed-directive.c
11

I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like this is a C2x/C++2b feature but compilable on Clang?)

I may be changing my mind on this a bit. I now see we don't issue an extension warning when using #elifdef or #elifndef in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306)

I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable:

The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g.,

#elfindef // expected-warning {{invalid preprocessing directive}} \
             expected-note {{did you mean '#elifdef'?}} \
             expected-note {{did you mean '#elifndef'?}}

WDYT?

cjdb added a comment.May 6 2022, 11:28 AM

Thanks for working on this!

ken-matsui added inline comments.May 7 2022, 2:03 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
362–363

Thank you for CCing him!

Thanks for working on this!

@cjdb

You’re welcome!

I currently think rustc's diagnostics like the following are clear to users rather than attaching a help to a warning like unknown directive found, did you mean #Hoge?.

https://github.com/rust-lang/rust/blob/598d89bf142823b5d84e2eb0f0f9e418ee966a4b/src/test/ui/suggestions/suggest-trait-items.stderr

So, I think it would be better if we could implement diagnostics to be able to emit info like rustc does.
I also implemented a feature to show line numbers in diagnostics, and it may be under review.

https://reviews.llvm.org/D125078

Could I have your opinion here?

clang/lib/Lex/PPDirectives.cpp
484–492

Oh, I see thank you. This is a too difficult problem when working with many contributors though...

clang/test/Preprocessor/suggest-typoed-directive.c
11

I may be changing my mind on this a bit. I now see we don't issue an extension warning when using #elifdef or #elifndef in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306)

Certainly, it would be less confusing to the user to avoid suggestions regarding those two.
I'm going to fix my code to avoid suggesting them in not C2x mode.

Thank you for submitting the issue, I also would like to work on it.

The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g.,

#elfindef // expected-warning {{invalid preprocessing directive}} \
             expected-note {{did you mean '#elifdef'?}} \
             expected-note {{did you mean '#elifndef'?}}

WDYT?

This is really cool, but don't you care about the redundancy of did you mean in terms of the llvm team culture?
If not, I will implement notes like the above.

llvm/include/llvm/ADT/StringRef.h
267 ↗(On Diff #427263)

It seems my previous comment that I posted on deleted file disappeared.

Could you please check the following link comment?

https://reviews.llvm.org/D124726#3493222

aaron.ballman added inline comments.
clang/test/Preprocessor/suggest-typoed-directive.c
11

Certainly, it would be less confusing to the user to avoid suggestions regarding those two. I'm going to fix my code to avoid suggesting them in not C2x mode.

+1, thank you!

This is really cool, but don't you care about the redundancy of did you mean in terms of the llvm team culture? If not, I will implement notes like the above.

I would care if the list were potentially unbounded (like, say, with identifiers in general), but because we know this list will only have a max of two entries on it in this case, it seems reasonable to me. I double-checked with @erichkeane to see if he thought it would be an issue, and he agreed that it being a fixed list makes it pretty reasonable.

However, that discussion did raise a question -- why are there two suggestions? elifdef requires a swap + delete and elifndef requires just a swap, so we would have thought that it would have been the only option in the list.

llvm/include/llvm/ADT/StringRef.h
267 ↗(On Diff #427263)

Thank you for calling this out, I had missed it! I'll respond here.

Should I keep going with this implementation or not?

I don't think we should add another implementation of Levenshtein distances; we should stick with the base functionality, which is what's used by the existing typo correction logic: https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaLookup.cpp#L4308

Ideally (but I'm not asking you to do this because it's likely a very big ask), we'd generalize the TypoCorrectionConsumer and related classes so that it can be used during Lex, Parse, or Sema with callbacks from the typo correction consumer to obtain the list of potential names for correction. However, that functionality is so tightly tied to Sema, it may not be particularly plausible.

One thing I noticed is that your implementation does not allow replacements, while the typo correction does. I think that likely explains the behavioral difference you're seeing between implementations.

erichkeane added inline comments.May 9 2022, 8:40 AM
clang/lib/Lex/PPDirectives.cpp
486

Should this be dependent on C modes? it would be kind of silly to suggest "invalid preprocessing token elifndef, did you mean elifndef"?

489

I don't believe this meets our requirements for 'auto'.

aaron.ballman added inline comments.May 9 2022, 8:53 AM
clang/lib/Lex/PPDirectives.cpp
486

Both C and C++ have the new directives and we enable them as an extension in all language modes, so no need to care about it being in C mode specifically. However, I think we do need to care about *which* C and C++ mode we're in so that we only suggest the new directives in C2x and C++2b mode.

Updated the code as reviewed

ken-matsui added inline comments.May 10 2022, 4:18 AM
clang/lib/Lex/PPDirectives.cpp
489

Thank you. I replaced it with the actual type.

clang/test/Preprocessor/suggest-typoed-directive.c
11

With the implementation of Lev distances used in llvm, I could simply suggest #elifdef from #elfidef and #elifndef from #elfindef.

So, in this situation, do you think that we still need to add two notes for conflicted distances?

llvm/include/llvm/ADT/StringRef.h
267 ↗(On Diff #427263)

Thank you.

I updated the test to fit the existing implementation.
The suggestion for #id couldn't be suggested as #if.

Fix the test for FindSimilarStr

aaron.ballman added inline comments.May 10 2022, 9:29 AM
clang/test/Preprocessor/suggest-typoed-directive.c
4

This still suggests that something's wrong as I would imagine this would have an edit distance of 1. Oh, interesting... setting the replacement option to false may have made things better for the elfindef case but worse for the id case?

This is tricky because we want to identify things that are most likely simple typos but exclude things that may reasonably not be a typo but a custom preprocessor directive. Based on that, I *think* setting the replacement option to true gives the more conservative answer (it treats a replacement as 1 edit rather than 2). @erichkeane -- do you have thoughts?

11

No, let's skip the two note behavior. If we find ourselves with multiple suggestions, we'll just leave off the "did you mean?" part of the diagnostic entirely.

erichkeane added inline comments.May 10 2022, 9:36 AM
clang/test/Preprocessor/suggest-typoed-directive.c
4

Not particularly. I don't have a good hold of how much we want to suggest with the 'did you mean'. Line 9 and line 10 here are unfortunate, I would hope those would happen? Its unfortunate we don't have a way to figure out these common typos.

11

Might I suggest we just emit the 'first' suggestion? This isn't really perfect, but I would think that our users put very little thought into it when we suggest the wrong thing.

aaron.ballman added inline comments.May 10 2022, 10:00 AM
clang/test/Preprocessor/suggest-typoed-directive.c
4

Yeah, ideally I want this to "do what I'm thinking" and make all three situations work. :-D

11

Notionally that makes sense, but the situation I'm worried about is when users pass -fixit to the compiler -- then it automatically does whatever, but we have no idea whether that whatever is reasonable or not when we have more than one option.

Set AllowReplacements to true

ken-matsui added inline comments.May 10 2022, 12:57 PM
clang/test/Preprocessor/suggest-typoed-directive.c
11

I could suggest #if from #id, but I had to change #elfindef to #elfinndef to suggest #elifndef.
After setting the AllowReplacements option to true, #elfindef suggested #elifdef (I think it might have the same distance with #elifndef).

aaron.ballman added inline comments.May 11 2022, 4:59 AM
clang/test/Preprocessor/suggest-typoed-directive.c
11

Thanks! I'm coming around more to the idea that we'll never get the "perfect" set of suggestions, and so whatever suggestions we get are quite likely good enough. However, I think it would be helpful to also add the tests where the behavior seems a bit strange, like #elfindef, just to show that we know the behavior is what it is and it doesn't do bad things like crash or whatnot.

llvm/lib/Support/StringRef.cpp
102 ↗(On Diff #428471)

I am less convinced that we want this as a general interface on StringRef. I think callers need to decide whether something is similar enough or not. For example, case sensitivity may not matter for this case but it might matter to another case. Others may wish to limit the max edit_distance for each of the candidates for performance reasons, others may not. We already saw there were questions about whether to allow replacements or not. That sort of thing.

I think this functionality should be moved to PPDirectives.cpp as a static helper function that's specific to this use. Also, because this returns one of the candidates passed in, and those are a StringRef, I think this function should return a StringRef -- this removes allocation costs. I'd also drop the Dist parameter as the function is no longer intended to be a general purpose one.

118–135 ↗(On Diff #428471)

I think we can skip the vector entirely by keeping track of the min edit distance we've seen thus far and which candidate that corresponds to when looping over the candidates.

ken-matsui added inline comments.May 11 2022, 6:49 AM
clang/test/Preprocessor/suggest-typoed-directive.c
11

Thank you :) Ok, I'll add the test as well.

llvm/lib/Support/StringRef.cpp
102 ↗(On Diff #428471)

I am also in favor of it, but where should I put tests for this functionality?
Is it possible to write unit tests for static functions implemented in .cpp files?

118–135 ↗(On Diff #428471)

👍

aaron.ballman added inline comments.May 11 2022, 9:33 AM
llvm/lib/Support/StringRef.cpp
102 ↗(On Diff #428471)

This is something we wouldn't usually add tests for directly but instead test the behavior of through lit -- the tests you already have exercise this code path and would show if there's a situation where we expect a viable candidate and don't find any. So I'd not worry about test coverage for it in this case.

Update the code as reviewed

Thank you for your support!

I updated the code, so could you please review this patch again?

llvm/lib/Support/StringRef.cpp
102 ↗(On Diff #428471)

I see. Thank you!

Remove unused includes in StringRef.h

ken-matsui added inline comments.May 11 2022, 6:27 PM
clang/test/Preprocessor/suggest-typoed-directive.c
37

Here, #elfindef is suggested to #elifdef, not #elifndef, but I believe it will help many users to realize that they have typo'd #elifndef, or maybe they might have meant actually #elifdef.

You should also come up with a release note for the changes.

clang/include/clang/Basic/DiagnosticLexKinds.td
434–435
clang/lib/Lex/PPDirectives.cpp
488–490
1266–1267

I think we should be attempting to suggest a typo for the error case as well e.g.,

#fi WHATEVER
#endif

we currently give no suggestion for that typo, just the error. However, this may require a fair amount of changes because of the various edge cases where we give better diagnostics than "unknown directive". e.g.,

#if WHATEVER // error: unterminated conditional directive
#endfi // no diagnostic

so if it looks like covering error cases is going to be involved, I'm fine doing it in a follow-up if you'd prefer.

clang/test/Preprocessor/suggest-typoed-directive.c
37

+1

Thank you for your review :)

clang/lib/Lex/PPDirectives.cpp
1266–1267

The former can be implemented easily, but the latter seems not easy.
So what about doing the latter in another patch?

Update the code as reviewed and add a release note

Add test for errored directive but no suggestion

aaron.ballman added inline comments.May 12 2022, 10:29 AM
clang/lib/Lex/PPDirectives.cpp
1266–1267

I'm fine doing the error cases entirely in another patch. The current approach here is a problem because it issues a warning and an error for the same line of code. So I'd go back to the way the code was before, and in a follow-up you can handle errors.

I think one approach to consider for that is having an argument to SuggestTypoedDirective() as to whether to err or warn so that we can reuse the same logic.

ken-matsui added inline comments.May 12 2022, 10:41 AM
clang/lib/Lex/PPDirectives.cpp
1266–1267

OK, reverting 👍

I'll work on it. Thank you.

Revert the changes for errored directives

This comment was removed by ken-matsui.

@aaron.ballman

I prepared a new public email address: vcs@kmatsui.me instead of the auto-generated one.
So, could you please use the new one for future commit?

This revision was not accepted when it landed; it landed in state Needs Review.May 13 2022, 6:17 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

LGTM, thank you for this! I adjusted the release note and commit message somewhat when landing to make sure they were detailed enough.

Thank you for your support!

I see an instance of this warning in the Linux kernel due to the "Now, for unknown directives inside a skipped conditional block, we diagnose the unknown directive as a warning if it is sufficiently similar to a directive specific to preprocessor conditional blocks" part of this change:

arch/x86/crypto/aesni-intel_asm.S:532:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
arch/x86/crypto/aesni-intel_asm.S:547:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
2 warnings generated.

This is a comment within an assembler file that will be preprocessed (this is a 32-bit x86 build and the block is guarded by #ifdef __x86_64__ on line 500):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/crypto/aesni-intel_asm.S?h=v5.18-rc7#n532

Is there anything that can be done to improve this heuristic for this case? I can try to push a patch to change the comment style for this one instance but I always worry that a warning of this nature will appear in the future and result in the kernel disabling this warning entirely.

I see an instance of this warning in the Linux kernel due to the "Now, for unknown directives inside a skipped conditional block, we diagnose the unknown directive as a warning if it is sufficiently similar to a directive specific to preprocessor conditional blocks" part of this change:

arch/x86/crypto/aesni-intel_asm.S:532:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
arch/x86/crypto/aesni-intel_asm.S:547:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
2 warnings generated.

Oh wow, that's a really neat one!

This is a comment within an assembler file that will be preprocessed (this is a 32-bit x86 build and the block is guarded by #ifdef __x86_64__ on line 500):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/crypto/aesni-intel_asm.S?h=v5.18-rc7#n532

Is there anything that can be done to improve this heuristic for this case? I can try to push a patch to change the comment style for this one instance but I always worry that a warning of this nature will appear in the future and result in the kernel disabling this warning entirely.

Ah, and we don't get an error for those because of special logic: https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/PPDirectives.cpp#L1243 and it looks like we may need similar logic before issuing the warnings as well.

@nathanchance

Oops, thank you for your comment!
I would like to work on a hotfix patch for this issue.

@aaron.ballman

Should we entirely opt-out of this suggestion on .S files? I think there are other approaches here, such as decreasing the max edit distance to avoid suggesting this case in .S files, but this approach is avoiding just this edge case so that it would possibly bring a new issue like this. Conversely, opting out of this suggestion on the whole .S files will not be able to catch any typoes. Considering possible edge cases (# directives are also treated as comments), I think opting out would be the only way, unfortunately. What do you think?

For now, I will create a patch opting out of this suggestion on .S files.

@nathanchance

Oops, thank you for your comment!
I would like to work on a hotfix patch for this issue.

@aaron.ballman

Should we entirely opt-out of this suggestion on .S files? I think there are other approaches here, such as decreasing the max edit distance to avoid suggesting this case in .S files, but this approach is avoiding just this edge case so that it would possibly bring a new issue like this. Conversely, opting out of this suggestion on the whole .S files will not be able to catch any typoes. Considering possible edge cases (# directives are also treated as comments), I think opting out would be the only way, unfortunately. What do you think?

For now, I will create a patch opting out of this suggestion on .S files.

I believe the if (getLangOpts().AsmPreprocessor) condition is appropriately accurate here. I don't think changing the edit distance would be a good behavior for anyone, and I don't see any other appropriate heuristics.

I see. Thank you. I fixed the issue using its workaround.

https://reviews.llvm.org/D125727