This is an archive of the discontinued LLVM Phabricator instance.

[CodeComplete] Add completions for filenames in #include directives.
ClosedPublic

Authored by sammccall on Sep 14 2018, 12:26 AM.

Details

Summary

The dir component ("somedir" in #include <somedir/fo...>) is considered fixed.
We append "foo" to each directory on the include path, and then list its files.

Completions are of the forms:
<somedir/foo.h>
^-text--^^-TT-^ (TypedText)
and
<somedir/foodir/
^-text--^^-TT--^

The filter is set to the filename part ("fo"), so fuzzy matching can be
applied to the filename only.

No fancy scoring/priorities are set, and no information is added to
CodeCompleteResult to make smart scoring possible. Could be in future.

Note directory iteration is currently stat-happy on unix but D51921 will fix.

Diff Detail

Event Timeline

sammccall created this revision.Sep 14 2018, 12:26 AM

Amazing, can't wait for it to land!

lib/Lex/Lexer.cpp
1931

It's also valid to use \ for path separators on Windows, maybe also handle those?
Clang also supports this when running e.g. clang-cl /c foo.cpp.

2051

Does that mean we're losing completion at the end of file?
Not sure if it's a big deal. The only common pattern I can come up with is starting with an empty file and writing:

#include "^

Should we handle that? WDYT?

2061

This code looks exactly the same between two functions.
Maybe extract it into a helper function?

lib/Sema/SemaCodeComplete.cpp
8021

Could we avoid adding /, > and " if it's currently missing?
Otherwise it'll be annoying for editors that auto-add closing quotes and completions when editing existing includes, i.e.

// It's cool if we complete bar.h" (with closing quote) here:
#include "foo/^
// It's annoying if we complete an extra closing quote here:
#include "foo/^"
8028

Maybe leave this out?
When completing std::^ the completion is vector, not std::vector.
In the same spirit, for includes I would expect completions for #include <foo/^> to be bar.h, not <foo/bar.h>.

8046

The size of /usr/include on my machine is ~830 files and dirs, which is a little close to a limit.
Not sure if the number of files grow with time or with a number of installed packages, but maybe we want a slightly higher limit to make sure it's won stop showing some results from this important include dir anytime soon?

8057

Maybe do case-insensitive matching?
A corner case, but still...

8076

NIT: Maybe return from each branch and put llvm_unreachable at the end of the function?
To get an error in case the new enum value is added in addition to the compiler warning.

Feel free to ignore if you like the current version more, the compiler warning is not likely to go unnoticed anyway.

Thanks a lot, great comments!
I haven't made changes yet (I will) but there's a few questions to work out first...

lib/Lex/Lexer.cpp
1931

Ah yuck, that's true (for some value of "valid"). Pre-C++11 it was explicitly UB...

So if a codebase consistently uses backslashes, we're going to break that style at least sometimes: we can't know whether to complete #include <foo as <foobar/ or <foobar\.

Any objection to just normalizing slashes for the whole include as part of completion?
e.g. #include <foo\ba would complete as #include <foo/bar/ and #include <foo/baz.h>?

Anything else is going to be fiddly, probably only help a few users a little bit, and leave edge cases.

2051

Ah no, it doesn't, though I thought so at first.

First: the address of C is CurPtr-1 (getAndAdvanceChar is in spirit return *CurPtr++).

Second: the buffer is null-terminated one-past-the-end (*BufferEnd == 0) and also has an embedded null at the completion point. So when completing at EOF it looks like:

# i n c l u d e < a \0 \0
                    ^C ^CurPtr
^BufferStart    ^BufPtr^BufferEnd

and we hit the case below. We only hit this case if we never saw a completion point.

lib/Sema/SemaCodeComplete.cpp
8021

Don't editors that add quotes/parens typically avoid duplicating them when they're typed/completed?

This isn't actually easy to do, because after completing file.h you want the cursor after the closing quote, and after completing dir/ you want the cursor inside the closing quote. I don't see a way to do this with CodeCompletionString other than generating the quote in the former case but not the latter. (At least not one that doesn't break something else...)

8028

I tried a few iterations here and this was the only one that didn't seem totally odd. Basically it's about the fact that these are quoted strings.

Ergonomically, having the closing quote completed after a filename *feels* right - you can hit enter afterwards and go to the next line.
If the closing quote is inserted, it has to be displayed (can't be avoided, also would be confusing).

So if you make the completion range just the filename, you end up with a completion list like:

#include <foo/ba^`
              bar.h>
              baz/
              backwards.h>

vs the current

#include <foo/ba^`
         <foo/bar.h>
         <foo/baz/
         <foo/backwards.h>

So I see your point here but I'm not sure the other behavior is actually better.

8057

This matches Driver's behavior (I'd reuse it but I don't think Sema can/should depend on Driver). Clang doesn't think "HPP" is a valid c++ header filename.
I'll add a comment.

sammccall updated this revision to Diff 165593.Sep 14 2018, 2:31 PM
sammccall marked 9 inline comments as done.

Unify common completion code from angled/quoted strings in Lexer.
Handle #include paths with \ on windows (normalize them to /)
Document why we picked particular extensions for headers.
Increase per-dir listing limit to 2500.

  1. Updating D52076: [CodeComplete] Add completions for filenames in #include directives. #
  2. Enter a brief description of the changes included in this update.
  3. The first line is used as subject, next lines as comment. #
  4. If you intended to create a new revision, use:
  5. $ arc diff --create

:

Addressed the comments I was sure about.
A couple of open questions in SemaCodeComplete about the formatting of the completions, but want to know what you think.

lib/Sema/SemaCodeComplete.cpp
8046

Wow, I only have 300.

Bumped the limit to 2500. This is based on handwavy advice on the internet that bad filesystems get slow in the low thousands.

8076

That won't give an error, that will give undefined behavior! (If a new value is added and the warning is missed)
The current code will just skip over the directory in that case, which I think is fine. (We have -Werror buildbots)

(Unless you mean return a dummy value, which is a little to unusual for my taste)

Sorry for the late response.

Please see the comments about adding closing quotes and slashes. It does not seem to work in the clients that auto-add closing quotes or when completing inside existing includes.

lib/Lex/Lexer.cpp
1931

If we normalize, we probably want a config option for Windows users to be friendly to them.
No good layer to add the option, though (except a compiler flag, but that looks weird), so I don't think that's directly addressable.

Also, do we want to actually change slashes before the path component we are completing in the same include directive?

  1. If we do, I suspect we might have problems with the existing language clients. We ran into problems when trying to add . to -> fix-its.
  2. If we don't, the completion label would look differently from what's written. That's probably not a big deal.
2051

Ah, I see. Thanks for explanation. LG

lib/Sema/SemaCodeComplete.cpp
8021

Don't editors that add quotes/parens typically avoid duplicating them when they're typed/completed?

They do that on typing, but not on completions, which are treated literally, e.g. they'll an extra quote is actually added.
I've checked in VSCode and it produces double quotes in those cases. Not sure about the other editors that have this.
That looks like a corner case my intuition is that other editors won't do what we want either.

8028

Ergonomically, having the closing quote completed after a filename *feels* right - you can hit enter afterwards and go to the next line.

It definitely gives the best experience in Vim (which does not add closing quotes automatically) when writing new code.
The results are confusing in editors that insert quotes, e.g.

#include "foo/bar/^" --> #include "foo/bar/baz.h""

And when completing in existing code even in Vim:

#include "foo/bar^/baz.h" --> #include "foo/baraba//baz.h"
#include "foo/bar/baz.h^" --> #include "foo/bar/baz.h""

The added ergonomics is definitely nice in some cases, but the added confusion in other (not the most frequent, but still quite common) cases doesn't look nice.
WDYT?

8076

Agree, dummy values are definitely worse.
I thought it deterministically crashes when compiled without optimizations, so you get a runtime error in addition to the compilation warning.

But yeah, -Werror buildbots will quickly catch it anyway, so it's not a big deal.

sammccall updated this revision to Diff 165759.Sep 17 2018, 7:02 AM

Address comments.
Only complete last segment for better compatibility.

After experimenting, editor support for replacing non-identifier text before the cursor is... spotty at best.
So switched to just replacing the last path component.

(This makes me a bit sad because the completions now have closing quotes but not opening ones, which looks messy)

lib/Lex/Lexer.cpp
1931

So we can't reliably do the right thing here. e.g. consider completing a directory after #include < at the top of the file - which slash style?
I'm not sure that sometimes respecting \-style separators is better than never.
It's hard to see how we add an option for this, and unclear that it will help many people.

(Also this is totally nonstandard and the insertions we're suggesting will work. People can reasonably expect *some* friction from their tools if they're particular about an unusual style.)

Also, do we want to actually change slashes before the path component we are completing in the same include directive?

In the latest patch, these aren't part of the replacement range.

lib/Sema/SemaCodeComplete.cpp
8021

Added logic (to the lexer) to eat the closing quote.
This makes VSCode do the right thing. Some naive editors may still get this wrong though.

8028

(see above, we're now eating the closing quote)

8076

I was confused about this too, it works the way you describe for debug builds.

For NDEBUG, llvm_unreachable deliberately invokes UB (via intrinsic) to allow the compiler to optimize out the code handling this case.

ilya-biryukov added inline comments.Sep 17 2018, 8:14 AM
lib/Lex/Lexer.cpp
2086

This will probably break for backslash includes.
We should probably also handle backslashes here when LangOpts.MSVCCompat is true.

sammccall marked an inline comment as done.

Handle completion after a backslash in MSVC-compat mode.

ilya-biryukov accepted this revision.Sep 18 2018, 12:27 AM

LGTM

lib/Sema/SemaCodeComplete.cpp
8057

IIUC, Driver uses this to infer the language based on file extension (when no explicit -x flag was provided).
However, this does not stop anyone from using uppercase file extensions in practice (and I assume inferring on headers is also pretty rare, since all compiles start with source files anyway).
But this should rare in any case, I don't think it's a blocker.

This revision is now accepted and ready to land.Sep 18 2018, 12:27 AM
sammccall added inline comments.Sep 18 2018, 1:30 AM
lib/Sema/SemaCodeComplete.cpp
8057

Oh true, headers will work in practice with any extension, so Driver might be too restrictive.
Made these case insensitive.

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.