Page MenuHomePhabricator

Please use GitHub pull requests for new patches. Avoid migrating existing patches. Phabricator shutdown timeline

Suggest typoed directives in preprocessor conditionals
ClosedPublic

Authored by ken-matsui on Apr 30 2022, 8:34 PM.

Details

Summary

This patch implements suggestion for typoed directives in preprocessor conditionals. The related issue is: https://github.com/llvm/llvm-project/issues/51598.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes
ken-matsui added inline comments.May 5 2022, 1:13 AM
clang/include/clang/Tooling/LevDistance.h
1 ↗(On Diff #426262)

@aaron.ballman

The computation results are not matched with my implementation and another language implementation.

So, several directives that could be suggested in my implementation are no longer possible to be suggested, such as #id -> #if, #elid -> #elif, and #elsi -> #else, when using llvm::ComputeEditDistance.

The implementation of llvm::ComputeEditDistance might be also correct, but some distances seem to be calculated longer, which causes fewer suggestions.

Should I keep going with this implementation or not?


llvm::ComputeEditDistance:

size_t Dist = Str1.edit_distance(Str2, false);
Str1: id, Str2: if, Dist: 2 # <-- here
Str1: id, Str2: ifdef, Dist: 3
Str1: id, Str2: ifndef, Dist: 4
Str1: ifd, Str2: if, Dist: 1
Str1: ifd, Str2: ifdef, Dist: 2
Str1: ifd, Str2: ifndef, Dist: 3

My implementation:

Str1: id, Str2: if, Dist: 1 # <-- here
Str1: id, Str2: ifdef, Dist: 3
Str1: id, Str2: ifndef, Dist: 4
Str1: ifd, Str2: if, Dist: 1
Str1: ifd, Str2: ifdef, Dist: 2
Str1: ifd, Str2: ifndef, Dist: 3

Rustc implementation:

Str1: id, Str2: if, Dist: 1 # <-- here
Str1: id, Str2: ifdef, Dist: 3
Str1: id, Str2: ifndef, Dist: 4
Str1: ifd, Str2: if, Dist: 1
Str1: ifd, Str2: ifdef, Dist: 2
Str1: ifd, Str2: ifndef, Dist: 3
ken-matsui updated this revision to Diff 427262.May 5 2022, 3:52 AM

Update codes as reviewed

Herald added a project: Restricted Project. · View Herald TranscriptMay 5 2022, 3:52 AM
ken-matsui updated this revision to Diff 427263.May 5 2022, 3:55 AM

Remove unnecessary includes

aaron.ballman added inline comments.
clang/include/clang/Basic/DiagnosticLexKinds.td
362–363

However, for future scalability, I think it would be great if we could split those diagnostics into Error & Warning and Help, for example. Rustc does split the diagnostics like the following, and I think this is quite clear. So, a bit apart from this patch, I speculate creating a diagnostic system that can split them would bring Clang diagnostics much more readable.

CC @cjdb for awareness. (He's working on a project for improving the way in which we communicate diagnostics to users.)

clang/lib/Lex/PPDirectives.cpp
484–492

Just wondering, but is there any reason not to use the const qualifier?

We have some quite inconsistent const correctness in our interfaces, so declaring local objects as const sometimes requires viral changes to make things compile. (I think we've mostly excised the uses of const_cast that would cast away constness in situations which weren't guaranteed to be safe in.)

clang/test/Preprocessor/suggest-typoed-directive.c
11

I agree with you because Clang implements those directives, and the suggested code will also be compilable. I personally think only not compilable suggestions should be avoided. (Or, we might place additional info for outside of C2x/C++2b mode like this is a C2x/C++2b feature but compilable on Clang?)

I may be changing my mind on this a bit. I now see we don't issue an extension warning when using #elifdef or #elifndef in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306)

I would say that we can suggest all the same distance like the following, but I'm not sure this is preferable:

The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g.,

#elfindef // expected-warning {{invalid preprocessing directive}} \
             expected-note {{did you mean '#elifdef'?}} \
             expected-note {{did you mean '#elifndef'?}}

WDYT?

cjdb added a comment.May 6 2022, 11:28 AM

Thanks for working on this!

ken-matsui added inline comments.May 7 2022, 2:03 AM
clang/include/clang/Basic/DiagnosticLexKinds.td
362–363

Thank you for CCing him!

Thanks for working on this!

@cjdb

You’re welcome!

I currently think rustc's diagnostics like the following are clear to users rather than attaching a help to a warning like unknown directive found, did you mean #Hoge?.

https://github.com/rust-lang/rust/blob/598d89bf142823b5d84e2eb0f0f9e418ee966a4b/src/test/ui/suggestions/suggest-trait-items.stderr

So, I think it would be better if we could implement diagnostics to be able to emit info like rustc does.
I also implemented a feature to show line numbers in diagnostics, and it may be under review.

https://reviews.llvm.org/D125078

Could I have your opinion here?

clang/lib/Lex/PPDirectives.cpp
484–492

Oh, I see thank you. This is a too difficult problem when working with many contributors though...

clang/test/Preprocessor/suggest-typoed-directive.c
11

I may be changing my mind on this a bit. I now see we don't issue an extension warning when using #elifdef or #elifndef in older language modes. That means suggesting those will be correct but only for Clang, and the user won't have any other diagnostics to tell them about the portability issue. And those particular macros are reasonably likely to be used in a header where the user is trying to aim for portability. So I'm starting to think we should only suggest those two in C2x mode (and we should probably add a portability warning for #elifdef and #elifndef, so I filed: https://github.com/llvm/llvm-project/issues/55306)

Certainly, it would be less confusing to the user to avoid suggestions regarding those two.
I'm going to fix my code to avoid suggesting them in not C2x mode.

Thank you for submitting the issue, I also would like to work on it.

The way we typically handle this is to attach FixIt hints to a note instead of to the diagnostic. This way, automatic fixes aren't applied (because there are multiple choices to pick from) but the user is still able to apply whichever fix they want in an IDE or other tool. It might be worth trying that approach (e.g., if there's only one candidate, attach it to the warning, and if there are two or more, emit a warning without a "did you mean" in it and use a new note for the fixit. e.g.,

#elfindef // expected-warning {{invalid preprocessing directive}} \
             expected-note {{did you mean '#elifdef'?}} \
             expected-note {{did you mean '#elifndef'?}}

WDYT?

This is really cool, but don't you care about the redundancy of did you mean in terms of the llvm team culture?
If not, I will implement notes like the above.

llvm/include/llvm/ADT/StringRef.h
267 ↗(On Diff #427263)

It seems my previous comment that I posted on deleted file disappeared.

Could you please check the following link comment?

https://reviews.llvm.org/D124726#3493222

aaron.ballman added inline comments.
clang/test/Preprocessor/suggest-typoed-directive.c
11

Certainly, it would be less confusing to the user to avoid suggestions regarding those two. I'm going to fix my code to avoid suggesting them in not C2x mode.

+1, thank you!

This is really cool, but don't you care about the redundancy of did you mean in terms of the llvm team culture? If not, I will implement notes like the above.

I would care if the list were potentially unbounded (like, say, with identifiers in general), but because we know this list will only have a max of two entries on it in this case, it seems reasonable to me. I double-checked with @erichkeane to see if he thought it would be an issue, and he agreed that it being a fixed list makes it pretty reasonable.

However, that discussion did raise a question -- why are there two suggestions? elifdef requires a swap + delete and elifndef requires just a swap, so we would have thought that it would have been the only option in the list.

llvm/include/llvm/ADT/StringRef.h
267 ↗(On Diff #427263)

Thank you for calling this out, I had missed it! I'll respond here.

Should I keep going with this implementation or not?

I don't think we should add another implementation of Levenshtein distances; we should stick with the base functionality, which is what's used by the existing typo correction logic: https://github.com/llvm/llvm-project/blob/main/clang/lib/Sema/SemaLookup.cpp#L4308

Ideally (but I'm not asking you to do this because it's likely a very big ask), we'd generalize the TypoCorrectionConsumer and related classes so that it can be used during Lex, Parse, or Sema with callbacks from the typo correction consumer to obtain the list of potential names for correction. However, that functionality is so tightly tied to Sema, it may not be particularly plausible.

One thing I noticed is that your implementation does not allow replacements, while the typo correction does. I think that likely explains the behavioral difference you're seeing between implementations.

erichkeane added inline comments.May 9 2022, 8:40 AM
clang/lib/Lex/PPDirectives.cpp
486

Should this be dependent on C modes? it would be kind of silly to suggest "invalid preprocessing token elifndef, did you mean elifndef"?

489

I don't believe this meets our requirements for 'auto'.

aaron.ballman added inline comments.May 9 2022, 8:53 AM
clang/lib/Lex/PPDirectives.cpp
486

Both C and C++ have the new directives and we enable them as an extension in all language modes, so no need to care about it being in C mode specifically. However, I think we do need to care about *which* C and C++ mode we're in so that we only suggest the new directives in C2x and C++2b mode.

Updated the code as reviewed

ken-matsui added inline comments.May 10 2022, 4:18 AM
clang/lib/Lex/PPDirectives.cpp
489

Thank you. I replaced it with the actual type.

clang/test/Preprocessor/suggest-typoed-directive.c
11

With the implementation of Lev distances used in llvm, I could simply suggest #elifdef from #elfidef and #elifndef from #elfindef.

So, in this situation, do you think that we still need to add two notes for conflicted distances?

llvm/include/llvm/ADT/StringRef.h
267 ↗(On Diff #427263)

Thank you.

I updated the test to fit the existing implementation.
The suggestion for #id couldn't be suggested as #if.

Fix the test for FindSimilarStr

aaron.ballman added inline comments.May 10 2022, 9:29 AM
clang/test/Preprocessor/suggest-typoed-directive.c
4

This still suggests that something's wrong as I would imagine this would have an edit distance of 1. Oh, interesting... setting the replacement option to false may have made things better for the elfindef case but worse for the id case?

This is tricky because we want to identify things that are most likely simple typos but exclude things that may reasonably not be a typo but a custom preprocessor directive. Based on that, I *think* setting the replacement option to true gives the more conservative answer (it treats a replacement as 1 edit rather than 2). @erichkeane -- do you have thoughts?

11

No, let's skip the two note behavior. If we find ourselves with multiple suggestions, we'll just leave off the "did you mean?" part of the diagnostic entirely.

erichkeane added inline comments.May 10 2022, 9:36 AM
clang/test/Preprocessor/suggest-typoed-directive.c
4

Not particularly. I don't have a good hold of how much we want to suggest with the 'did you mean'. Line 9 and line 10 here are unfortunate, I would hope those would happen? Its unfortunate we don't have a way to figure out these common typos.

11

Might I suggest we just emit the 'first' suggestion? This isn't really perfect, but I would think that our users put very little thought into it when we suggest the wrong thing.

aaron.ballman added inline comments.May 10 2022, 10:00 AM
clang/test/Preprocessor/suggest-typoed-directive.c
4

Yeah, ideally I want this to "do what I'm thinking" and make all three situations work. :-D

11

Notionally that makes sense, but the situation I'm worried about is when users pass -fixit to the compiler -- then it automatically does whatever, but we have no idea whether that whatever is reasonable or not when we have more than one option.

Set AllowReplacements to true

ken-matsui added inline comments.May 10 2022, 12:57 PM
clang/test/Preprocessor/suggest-typoed-directive.c
11

I could suggest #if from #id, but I had to change #elfindef to #elfinndef to suggest #elifndef.
After setting the AllowReplacements option to true, #elfindef suggested #elifdef (I think it might have the same distance with #elifndef).

aaron.ballman added inline comments.May 11 2022, 4:59 AM
clang/test/Preprocessor/suggest-typoed-directive.c
11

Thanks! I'm coming around more to the idea that we'll never get the "perfect" set of suggestions, and so whatever suggestions we get are quite likely good enough. However, I think it would be helpful to also add the tests where the behavior seems a bit strange, like #elfindef, just to show that we know the behavior is what it is and it doesn't do bad things like crash or whatnot.

llvm/lib/Support/StringRef.cpp
102 ↗(On Diff #428471)

I am less convinced that we want this as a general interface on StringRef. I think callers need to decide whether something is similar enough or not. For example, case sensitivity may not matter for this case but it might matter to another case. Others may wish to limit the max edit_distance for each of the candidates for performance reasons, others may not. We already saw there were questions about whether to allow replacements or not. That sort of thing.

I think this functionality should be moved to PPDirectives.cpp as a static helper function that's specific to this use. Also, because this returns one of the candidates passed in, and those are a StringRef, I think this function should return a StringRef -- this removes allocation costs. I'd also drop the Dist parameter as the function is no longer intended to be a general purpose one.

118–135 ↗(On Diff #428471)

I think we can skip the vector entirely by keeping track of the min edit distance we've seen thus far and which candidate that corresponds to when looping over the candidates.

ken-matsui added inline comments.May 11 2022, 6:49 AM
clang/test/Preprocessor/suggest-typoed-directive.c
11

Thank you :) Ok, I'll add the test as well.

llvm/lib/Support/StringRef.cpp
102 ↗(On Diff #428471)

I am also in favor of it, but where should I put tests for this functionality?
Is it possible to write unit tests for static functions implemented in .cpp files?

118–135 ↗(On Diff #428471)

👍

aaron.ballman added inline comments.May 11 2022, 9:33 AM
llvm/lib/Support/StringRef.cpp
102 ↗(On Diff #428471)

This is something we wouldn't usually add tests for directly but instead test the behavior of through lit -- the tests you already have exercise this code path and would show if there's a situation where we expect a viable candidate and don't find any. So I'd not worry about test coverage for it in this case.

Update the code as reviewed

Thank you for your support!

I updated the code, so could you please review this patch again?

llvm/lib/Support/StringRef.cpp
102 ↗(On Diff #428471)

I see. Thank you!

Remove unused includes in StringRef.h

ken-matsui added inline comments.May 11 2022, 6:27 PM
clang/test/Preprocessor/suggest-typoed-directive.c
37

Here, #elfindef is suggested to #elifdef, not #elifndef, but I believe it will help many users to realize that they have typo'd #elifndef, or maybe they might have meant actually #elifdef.

You should also come up with a release note for the changes.

clang/include/clang/Basic/DiagnosticLexKinds.td
434–435
clang/lib/Lex/PPDirectives.cpp
488–490
1266–1267

I think we should be attempting to suggest a typo for the error case as well e.g.,

#fi WHATEVER
#endif

we currently give no suggestion for that typo, just the error. However, this may require a fair amount of changes because of the various edge cases where we give better diagnostics than "unknown directive". e.g.,

#if WHATEVER // error: unterminated conditional directive
#endfi // no diagnostic

so if it looks like covering error cases is going to be involved, I'm fine doing it in a follow-up if you'd prefer.

clang/test/Preprocessor/suggest-typoed-directive.c
37

+1

Thank you for your review :)

clang/lib/Lex/PPDirectives.cpp
1266–1267

The former can be implemented easily, but the latter seems not easy.
So what about doing the latter in another patch?

Update the code as reviewed and add a release note

Add test for errored directive but no suggestion

aaron.ballman added inline comments.May 12 2022, 10:29 AM
clang/lib/Lex/PPDirectives.cpp
1266–1267

I'm fine doing the error cases entirely in another patch. The current approach here is a problem because it issues a warning and an error for the same line of code. So I'd go back to the way the code was before, and in a follow-up you can handle errors.

I think one approach to consider for that is having an argument to SuggestTypoedDirective() as to whether to err or warn so that we can reuse the same logic.

ken-matsui added inline comments.May 12 2022, 10:41 AM
clang/lib/Lex/PPDirectives.cpp
1266–1267

OK, reverting 👍

I'll work on it. Thank you.

Revert the changes for errored directives

This comment was removed by ken-matsui.

@aaron.ballman

I prepared a new public email address: vcs@kmatsui.me instead of the auto-generated one.
So, could you please use the new one for future commit?

This revision was not accepted when it landed; it landed in state Needs Review.May 13 2022, 6:17 AM
This revision was landed with ongoing or failed builds.
This revision was automatically updated to reflect the committed changes.

LGTM, thank you for this! I adjusted the release note and commit message somewhat when landing to make sure they were detailed enough.

Thank you for your support!

I see an instance of this warning in the Linux kernel due to the "Now, for unknown directives inside a skipped conditional block, we diagnose the unknown directive as a warning if it is sufficiently similar to a directive specific to preprocessor conditional blocks" part of this change:

arch/x86/crypto/aesni-intel_asm.S:532:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
arch/x86/crypto/aesni-intel_asm.S:547:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
2 warnings generated.

This is a comment within an assembler file that will be preprocessed (this is a 32-bit x86 build and the block is guarded by #ifdef __x86_64__ on line 500):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/crypto/aesni-intel_asm.S?h=v5.18-rc7#n532

Is there anything that can be done to improve this heuristic for this case? I can try to push a patch to change the comment style for this one instance but I always worry that a warning of this nature will appear in the future and result in the kernel disabling this warning entirely.

I see an instance of this warning in the Linux kernel due to the "Now, for unknown directives inside a skipped conditional block, we diagnose the unknown directive as a warning if it is sufficiently similar to a directive specific to preprocessor conditional blocks" part of this change:

arch/x86/crypto/aesni-intel_asm.S:532:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
arch/x86/crypto/aesni-intel_asm.S:547:8: warning: invalid preprocessing directive, did you mean '#if'? [-Wunknown-directives]
                                        # in in order to perform
                                          ^
2 warnings generated.

Oh wow, that's a really neat one!

This is a comment within an assembler file that will be preprocessed (this is a 32-bit x86 build and the block is guarded by #ifdef __x86_64__ on line 500):

https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/arch/x86/crypto/aesni-intel_asm.S?h=v5.18-rc7#n532

Is there anything that can be done to improve this heuristic for this case? I can try to push a patch to change the comment style for this one instance but I always worry that a warning of this nature will appear in the future and result in the kernel disabling this warning entirely.

Ah, and we don't get an error for those because of special logic: https://github.com/llvm/llvm-project/blob/main/clang/lib/Lex/PPDirectives.cpp#L1243 and it looks like we may need similar logic before issuing the warnings as well.

@nathanchance

Oops, thank you for your comment!
I would like to work on a hotfix patch for this issue.

@aaron.ballman

Should we entirely opt-out of this suggestion on .S files? I think there are other approaches here, such as decreasing the max edit distance to avoid suggesting this case in .S files, but this approach is avoiding just this edge case so that it would possibly bring a new issue like this. Conversely, opting out of this suggestion on the whole .S files will not be able to catch any typoes. Considering possible edge cases (# directives are also treated as comments), I think opting out would be the only way, unfortunately. What do you think?

For now, I will create a patch opting out of this suggestion on .S files.

@nathanchance

Oops, thank you for your comment!
I would like to work on a hotfix patch for this issue.

@aaron.ballman

Should we entirely opt-out of this suggestion on .S files? I think there are other approaches here, such as decreasing the max edit distance to avoid suggesting this case in .S files, but this approach is avoiding just this edge case so that it would possibly bring a new issue like this. Conversely, opting out of this suggestion on the whole .S files will not be able to catch any typoes. Considering possible edge cases (# directives are also treated as comments), I think opting out would be the only way, unfortunately. What do you think?

For now, I will create a patch opting out of this suggestion on .S files.

I believe the if (getLangOpts().AsmPreprocessor) condition is appropriately accurate here. I don't think changing the edit distance would be a good behavior for anyone, and I don't see any other appropriate heuristics.

I see. Thank you. I fixed the issue using its workaround.

https://reviews.llvm.org/D125727