This is an archive of the discontinued LLVM Phabricator instance.

[clangd] Implement a Proof-of-Concept tool for symbol index exploration
ClosedPublic

Authored by kbobyrev on Sep 4 2018, 6:45 AM.

Diff Detail

Repository
rL LLVM

Event Timeline

kbobyrev created this revision.Sep 4 2018, 6:45 AM
kbobyrev updated this revision to Diff 163800.Sep 4 2018, 6:53 AM

Cleaned up few minor issues.

Definitely like the idea of the tool. The main complication seems to be parsing of user input at this point.
I suggest we explore an option proposed before, that is reusing the LLVM command-line parser (see inline comment too).
If that turns out to be much work, we could explore rolling out a simple parser for commands on our own.

clang-tools-extra/clangd/dexplorer/Dexplorer.cpp
39 ↗(On Diff #163800)

Maybe we could expose CommandLineParser from llvm/lib/Support/CommandLine.cpp as a public API and use it here?
Not sure if there are any obstacles to doing so or how much work is it, though.
E.g. cl::opt seem to rely on being registered in the global parser and I'm not sure if there's an easy way out of it.

147 ↗(On Diff #163800)

+1 to this FIXME.

Something like:

template <class Func>
auto reportTime(StringRef Name, Func F) -> decltype(F()) {
  auto Result = F();
  llvm::outs() << Name << " took " << ...
  return Result;
}

The code calling this API would be quite readable:

auto Index = reportTime("Build stage", []() { 
  return buildStaticIndex(...);
});
kbobyrev added inline comments.Sep 9 2018, 3:20 PM
clang-tools-extra/clangd/dexplorer/Dexplorer.cpp
39 ↗(On Diff #163800)

Yes, that might be tricky. I have thought about a few options, e.g. consuming the first token from the input string as the command and dispatching arguments parsed via CommandLineParser manually (discarding those which are not accepted by provided command, etc). There are still few complications: help wouldn't be separate and easily accessible (unless hardcoded, which is suboptimal).

Another approach I could think of would be simplifying the interface of each command so that it would be easily parsed:

  • > fuzzy-find request.json: this would require FuzzyFindRequest JSON (de)serialization implementation, but that would be useful for the benchmark tool, too
  • > lookup-id symbolid
  • > load symbol.yaml/swap symbol.yaml
  • > density-hist
  • > tokens-distribution
  • > dense-tokens
  • > sparse-tokens

And also a generic > help for the short reference of each command. Out of all these commands, only Fuzzy Find request would benefit a lot from a proper parsing of arguments, having JSON file representation might not be ideal, but it looks OK for the first iteration for me. Fuzzy Find request would presumably be one of the most used commands, though, but then again, we could iterate if that is not really convinient.

147 ↗(On Diff #163800)

Ah, this looks really clean! I had few ideas of how to do that, but I couldn't converge to a reasonable solution which wouldn't look like abusing C++ to me :) Thanks!

ilya-biryukov added inline comments.Sep 10 2018, 2:19 AM
clang-tools-extra/clangd/dexplorer/Dexplorer.cpp
39 ↗(On Diff #163800)

The single-argument and no-arg commands certainly look nice. However, creating a separate file for each fuzzy-find request feels like a pain.

Have you tried prototyping parsing with CommandLineParser ? What are the complications to exposing it from LLVM?

A tool like this will be a useful addition!

I think the big high-level questions are:

  • UI: a REPL seems like a good model, but we need a more user-friendly way to read commands
  • Testing: need a plan for either a) testing the commands or b) keeping the commands simple/readable enough that we can get away without testing them.

I suspect a good design links these questions together (e.g. by establishing a simple pattern for reading/printing to keep the eval part simple, and read/print is most of the UI).

I'm not sure I agree this should be deferred until later.

clang-tools-extra/clangd/CMakeLists.txt
75 ↗(On Diff #163800)

if this is coupled to dex, and dex has its own directory, this should be a subdirectory I think

clang-tools-extra/clangd/dexplorer/Dexplorer.cpp
27 ↗(On Diff #163800)

please avoid mentioning YAML, as we now have multiple formats.
"index file"?

41 ↗(On Diff #163800)

find references

62 ↗(On Diff #163800)

Agree with the concerns about the usability of this model, a command-like model would be nicer.
If we want to start with something simple without worrying too much about complex inputs, maybe just accept queries one-per-line and don't allow the other options to be set yet?

Otherwise it seems like all of this is likely to need to be replaced in a backwards-incompatible way...

122 ↗(On Diff #163800)

I'm not sure this will actually be a good UX.
Maybe just make sure you always include the symbol ID so the user can get details?

147 ↗(On Diff #163800)

+1 - could you add this in this patch? would improve readability already

kbobyrev updated this revision to Diff 164834.Sep 11 2018, 3:23 AM
kbobyrev marked 8 inline comments as done.

Address a round of comments; implement a dummy ad-hoc subcommand parser.

kbobyrev added inline comments.Sep 11 2018, 3:32 AM
clang-tools-extra/clangd/dexplorer/Dexplorer.cpp
39 ↗(On Diff #163800)

I didn't, but I looked closely at clang-query which decided not to do that IIUC due to the complications with the CommandLineParser and also the specifics of the tool and clang-refactor which utilizes CommandLineRefactoringOptionVisitor and RefactoringActionSubcommand internal implementations, but overall I don't think it looks very simple.

We had a discussion with Eric and Sam, the consensus was that we should start with something very simple and iterate on that if we decide that more complicated interface is necessary.

Dexplorer is a long name. How about shortening it to dexp?

clang-tools-extra/clangd/index/dex/dexplorer/DExplorer.cpp
56 ↗(On Diff #164834)

Why do we want TimeUnit?
It adds complexity, if we want to make large values more readable we have other alternatives:

  • printing adaptive units, e.g. "when it's larger than 5000ms, start printing seconds", "when it's larger than 5minutes, start printing minutes"
  • provide separators (500000000ms is pretty much unreadable, 500 000 000ms is much better)
125 ↗(On Diff #164834)

This function does too many things:

  • parses the input.
  • dispatches on different kinds of requests.
  • handles user input errors.
  • actually runs the commands.

This turns out somewhat unreadable at the end, we might want to separate those into multiple functions. Will need a bit of time to think on how to actually do it best. Will come back after I have concrete suggestions.

139 ↗(On Diff #164834)

This produces a ton of output.
Could we simply produce a message to run 'help' to get a list of supported commands instead?

It could be really annoying to see large help messages on every typo.

sammccall added inline comments.Sep 11 2018, 4:50 AM
clang-tools-extra/clangd/index/dex/dexplorer/DExplorer.cpp
23 ↗(On Diff #164834)

why?

56 ↗(On Diff #164834)

+1 just pick ms or us for now

You can use formatv({0:ms}, Duration) to print neatly, no need for the traits.

79 ↗(On Diff #164834)

nit: this starts with a newline

82 ↗(On Diff #164834)

nit: just "find" for convenience?

87 ↗(On Diff #164834)

examples use lookup-id, code says lookup.
(I prefer lookup FWIW)

90 ↗(On Diff #164834)

(Not sure the "help" doc or "press ctrl-d to exit" are particularly useful)

100 ↗(On Diff #164834)

SymbolIndex&, you don't need the unique_ptr

100 ↗(On Diff #164834)

nit: lookup (the "request" is the arg)

103 ↗(On Diff #164834)

why not print them on-the-fly?
(this isn't safe, the data backing the symbols may not live long enough)

106 ↗(On Diff #164834)

please be consistent in using . or | as separator

106 ↗(On Diff #164834)

I'd suggest putting the ID first, as it's fixed-width so the columns will actually be columns

125 ↗(On Diff #164834)

For now can we keep it simple:

  • this function can split the command from the rest, and pass the rest of the args to fuzzyFindRequest etc
  • if the command is unknown, the dispatcher reports the error
  • if the command is known, the command handler reports input errors (this could be cleaner/more declarative, but we don't want to build a big framework in this patch)
  • this function could do the timing - the request construction is negligible and that keeps the command handlers focused on their job
130 ↗(On Diff #164834)

REPLs don't usually make this an error, just return

157 ↗(On Diff #164834)

as above, resist the temptation to dump the help on every opportunity...
just "Unknown command. Try 'help'." or so is more friendly I think

167 ↗(On Diff #164834)

fixme is done

176 ↗(On Diff #164834)

still says YAML

kbobyrev updated this revision to Diff 164865.Sep 11 2018, 6:56 AM
kbobyrev marked 19 inline comments as done.

Address comments.

sammccall accepted this revision.Sep 11 2018, 7:31 AM

Really just details now.

clang-tools-extra/clangd/index/dex/dexp/Dexp.cpp
56 ↗(On Diff #164865)

(I don't think you want to share this between fuzzyfind and lookup, see below)

62 ↗(On Diff #164865)

unused

63 ↗(On Diff #164865)

why? (and below)

If you want all commands to be separated by a blank line, the dispatcher can do it

64 ↗(On Diff #164865)

you have this comment twice

90 ↗(On Diff #164865)

note that you're looking up one ID so you'll always get 0 or 1 result

95 ↗(On Diff #164865)

Rank is never meaningful for lookup, especially when only looking up one id

97 ↗(On Diff #164865)

this isn't enough detail to be really useful. I'd suggest dumping the whole symbol as YAML, or printing 'not found' if no result.

clang-tools-extra/clangd/index/dex/dexplorer/DExplorer.cpp
125 ↗(On Diff #164834)

this is not done (dispatchRequest is still doing validation for each command)

130 ↗(On Diff #164834)

this is not done

180 ↗(On Diff #164834)

dexp?

This revision is now accepted and ready to land.Sep 11 2018, 7:31 AM
ilya-biryukov added inline comments.Sep 11 2018, 7:42 AM
clang-tools-extra/clangd/index/dex/dexp/CMakeLists.txt
10 ↗(On Diff #164865)

Should we indent closing parens to the opening one or keep at the start of the line?
Let's pick one style and be consistent (the best option is being consistent with the rest of LLVM/clangd)

clang-tools-extra/clangd/index/dex/dexp/Dexp.cpp
46 ↗(On Diff #164865)

NIT: Now that we ignore the return value, we could even remove templates:

void reportTime(StringRef Name, llvm::function_ref<void()> F);
kbobyrev updated this revision to Diff 164893.Sep 11 2018, 8:38 AM
kbobyrev marked 9 inline comments as done.

Address comments

ilya-biryukov added inline comments.Sep 11 2018, 8:42 AM
clang-tools-extra/clangd/index/dex/dexp/Dexp.cpp
55 ↗(On Diff #164893)

Is this a leftover from previous changes?

kbobyrev updated this revision to Diff 165016.Sep 12 2018, 12:33 AM
kbobyrev marked an inline comment as done.

Remove artifact comment.

This revision was automatically updated to reflect the committed changes.