This is an archive of the discontinued LLVM Phabricator instance.

Diagnose likely typos in include statements
ClosedPublic

Authored by christylee on Aug 27 2018, 4:59 PM.

Details

Summary

When someone writes

#include "<some_file.h>"

or

#include " some_file.h "

the compiler returns "file not fond..." with fonts and quotes that may make it hard to see there are excess quotes or surprising bytes in the filename. Assuming that files are usually logically named and start and end with an alphanumeric character, we can check for the file's existence by stripping the non-alphanumeric leading or trailing characters. If the file is found, emit a non-fatal error with a fixithint.

Diff Detail

Event Timeline

christylee created this revision.Aug 27 2018, 4:59 PM
modocache added subscribers: aaron.ballman, erikjv, modocache.

This looks good to me, but maybe some people who've modified this part of the codebase before could review this as well? @aaron.ballman added a fix-it for angled/quoted strings in rL160406, and more recently @erikjv modified some of this code in rL285057. If anyone could suggest other reviewers that would be great as well!

aaron.ballman added inline comments.Aug 28 2018, 12:12 PM
include/clang/Basic/DiagnosticLexKinds.td
488

This seems like the wrong warning group for this diagnostic as it doesn't relate to pragmas. However, we may not want this exposed as a separate warning, either.

lib/Lex/PPDirectives.cpp
1926–1927

I don't think we want to produce two different diagnostics for the same line of code. What if, instead, we augment the error diagnostic so that it can produce additional information in this case?

def err_pp_file_not_found : Error<"'%0' file not found%select{|, possibly due to leading or trailing non-alphanumeric characters in the file name}1">, DefaultFatal;

(or something along those lines.)

test/Frontend/include-likely-typo.c
1–4 ↗(On Diff #162779)

I think we want this test to live in test/Preprocessor instead of test/Frontend.

christylee edited the summary of this revision. (Show Details)

Merged warning with existing file_not_found_error.

aaron.ballman added inline comments.
lib/Lex/PPDirectives.cpp
1888

I'd pass false directly here...

1920

... and lower isFileNotFoundLikelyTypo to here and rename it IsFileNotFoundLikelyTypo to meet our usual naming conventions...

1925–1926

...then remove this call to Diag()...

1928–1929

As it stands, this is going to diagnose the same issue twice.

test/Preprocessor/include-likely-typo.c
1–2 ↗(On Diff #163132)

I don't think this is the correct formulation for the test, which explains why it's passing for you. I think the test should be:

// RUN: %clang_cc1 %s -verify

#include "<hello.h>" // expected-error {{'<hello.h>' file not found, possibly due to leading or trailing non-alphanumeric characters in the file name}}

Instead of guessing whether the corrected filename would be valid, why not strip off the leading and trailing non-alphanumeric characters, look up the resulting filename, and find out? If we did that, then not only could we be a lot more confident that we'd found the file that was intended, but we could also recover from the error by including the trimmed filename.

Instead of guessing whether the corrected filename would be valid, why not strip off the leading and trailing non-alphanumeric characters, look up the resulting filename, and find out? If we did that, then not only could we be a lot more confident that we'd found the file that was intended, but we could also recover from the error by including the trimmed filename.

I was contemplating that approach, but I was also a bit worried about reaching out to the filesystem again. We already reach out once for angled vs quoted include path issues, and I don't think these two tests can be combined easily. However, this is on an error path and so performance isn't super important, so I think I've convinced myself this is a good approach to try.

christylee marked 2 inline comments as done.Aug 31 2018, 7:51 AM

Instead of guessing whether the corrected filename would be valid, why not strip off the leading and trailing non-alphanumeric characters, look up the resulting filename, and find out? If we did that, then not only could we be a lot more confident that we'd found the file that was intended, but we could also recover from the error by including the trimmed filename.

Should we lookup the stripped filename and then error out on a warning along the lines of "'<hello,h>' file not found, did you mean 'hello,h'", or attempt to continue to compile using the file we found (i.e. "hello.h"). I'm learning towards the former because I'm worried the latter might be "too clever", in case the developers actually think <hello.h> and hello.h are two different files.

Instead of guessing whether the corrected filename would be valid, why not strip off the leading and trailing non-alphanumeric characters, look up the resulting filename, and find out? If we did that, then not only could we be a lot more confident that we'd found the file that was intended, but we could also recover from the error by including the trimmed filename.

Should we lookup the stripped filename and then error out on a warning along the lines of "'<hello,h>' file not found, did you mean 'hello,h'", or attempt to continue to compile using the file we found (i.e. "hello.h"). I'm learning towards the former because I'm worried the latter might be "too clever", in case the developers actually think <hello.h> and hello.h are two different files.

I think we should issue a (non-fatal) error along the lines of "<hello.h> not found; did you mean "hello.h"?" with a FixItHint, and carry on by including the file we believe they meant. The chance of hitting a false positive by doing this seems extremely remote to me, so I'm not worried about it being "too clever" and doing the wrong thing. (You'd not only need for the developer to intend to use a file whose name starts or ends with an odd character, but also for that file to not exist and a file whose name is the same but with said character(s) removed to actually exist.)

christylee edited the summary of this revision. (Show Details)

Emit non-fatal error for typo if file exists.

christylee marked 3 inline comments as done.Sep 12 2018, 3:04 PM

Thanks, some comments but the approach here looks great.

include/clang/Basic/DiagnosticLexKinds.td
428–429

Our usual convention is to phrase this as "'%0' file not found; did you mean '%1'?". I don't think we need to describe the exact process we used to figure out the intended filename here; the user should be able to compare the names before and after to determine what we changed.

lib/Lex/PPDirectives.cpp
1901

You're passing in false for isAngled here, and producing a double-quoted replacement below. Is that intentional? I would expect that we would preserve the form of the header-name (quoted or angled) and suggest a replacement with the same form.

(No action needed, but curious on your thoughts)

One thing I'd like to do sometime is add code completion of filenames in #include directives.
This would mean listing the relevant directories from the include path. (VFS is slow for this today, but I have a patch out for it).

Maybe it would make sense to do that here and use edit-distance to pick the suggestion, like typo correction elsewhere?
Could be a way to extend this patch's behavior in the future.

christylee added inline comments.Sep 13 2018, 8:41 AM
lib/Lex/PPDirectives.cpp
1901

My mistake, thanks for catching it!

Addressed @rsmith 's comments.

rsmith accepted this revision.Sep 13 2018, 11:58 AM

Looks good (with a couple of minor code style adjustments). Do you need someone to commit this for you?

One thing I'd like to do sometime is add code completion of filenames in #include directives.
This would mean listing the relevant directories from the include path. (VFS is slow for this today, but I have a patch out for it).

Maybe it would make sense to do that here and use edit-distance to pick the suggestion, like typo correction elsewhere?
Could be a way to extend this patch's behavior in the future.

I agree; I think this would be a very interesting extension for this patch if someone feels motivated.

lib/Lex/PPDirectives.cpp
1891

Nit: please spell out the type here, because it's not obvious from the initializer. Please capitalize the variable name.

1907

Nit: please capitalize this variable name.

This revision is now accepted and ready to land.Sep 13 2018, 11:58 AM
xbolva00 added a subscriber: xbolva00.EditedSep 13 2018, 12:07 PM

While here, I would love to have fixint hints for unknown functions.

E.g.
std::forward somewhere in code -> fixit: "try to add #include <utility>"

christylee marked 6 inline comments as done.

@rsmith , thanks for the review, I fixed the variable capitalization. If you could land it for me that'll be awesome!

One thing I'd like to do sometime is add code completion of filenames in #include directives.
This would mean listing the relevant directories from the include path. (VFS is slow for this today, but I have a patch out for it).

Maybe it would make sense to do that here and use edit-distance to pick the suggestion, like typo correction elsewhere?
Could be a way to extend this patch's behavior in the future.

I also think it's a good idea. I'm happy to take a crack at it!

The tests seem to have disappeared form the diff.

Added tests.

christylee marked an inline comment as done.Sep 13 2018, 12:55 PM
This revision was automatically updated to reflect the committed changes.