This is an archive of the discontinued LLVM Phabricator instance.

[Option] Add 'findNearest' method to catch typos
ClosedPublic

Authored by modocache on Jan 4 2018, 10:07 AM.

Details

Summary

Add a method OptTable::findNearest, which allows users of OptTable to
check user input for misspelled options. In addition, have llvm-mt
check for misspelled options. For example, if a user invokes
llvm-mt /oyt:foo, the error message will indicate that while an
option named /oyt: does not exist, /out: does.

The method ports the functionality of the LookupNearestOption method
from LLVM CommandLine to libLLVMOption. This allows tools like Clang
and Swift, which do not use CommandLine, to use this functionality to
suggest similarly spelled options.

As room for future improvement, the new method as-is cannot yet properly suggest
nearby "joined" options -- that is, for an option string "-FozBar", where
"-Foo" is the correct option name and "Bar" is the value being passed along
with the misspelled option, this method will calculate an edit distance of 4,
by deleting "Bar" and changing "z" to "o". It should instead calculate an edit
distance of just 1, by changing "z" to "o" and recognizing "Bar" as a
value. This commit includes a disabled test that expresses this limitation.

Test Plan: check-llvm

Event Timeline

modocache created this revision.Jan 4 2018, 10:07 AM
jroelofs added inline comments.
lib/Option/OptTable.cpp
258

I think you can use range-for via:

for (const Info &CandidateInfo : ArrayRef(OptionInfos).drop_front(FirstSearchableIndex))
260

The ctor call here isn't needed, it's implicit in the assign if you drop it.

293

I think this is a dangling reference to a dead temporary. Make Delimiter a std::string?

301

std::tie(LHS, RHS) = Option.split(Last)

tools/llvm-mt/llvm-mt.cpp
29

Include raw_ostream.h instead, and use llvm::raw_string_ostream instead of ostringstream.

modocache updated this revision to Diff 128665.Jan 4 2018, 3:52 PM

Thanks for the review, @jroelofs! I think I addressed your comments and adopted your suggestions.

modocache edited the summary of this revision. (Show Details)Jan 4 2018, 3:53 PM
ruiu added inline comments.Jan 4 2018, 11:17 PM
lib/Option/OptTable.cpp
280

I wonder why you chose to use the prefix matching instead of edit distance. Conveniently, StringRef has edit_distance function, so you can use that if you want.

jroelofs added inline comments.Jan 5 2018, 4:10 AM
lib/Option/OptTable.cpp
280

+1

308

These are the same whether RHS is empty or not.

modocache marked 4 inline comments as done.Jan 5 2018, 4:29 AM
modocache added inline comments.
lib/Option/OptTable.cpp
280

Ah, my apologies, I think this code may not be clear.

Just to make sure we're on the same page: this loop isn't matching input string --hel to valid option --help, it's matching each valid option's set of prefixes (for most LLVM programs' options, those are -, --, or both) to --hel.

The goal here is to find the prefix that matches the one the user provided. The reason I'm doing this is because, below when we compare using StringRef::edit_distance, we get an edit distance of 1 between --hel and both --help (add one p) and -help (remove one -).

But I think it would be nice, if the user provided the input --hel, for LLVM to suggest an option with the same prefix, so I'd like to deterministically return --help. That is, I do *not* want to return -help.

Maybe that's not as desirable behavior as I think it is. But, in order to implement it, I believe I'm unable to use StringRef::edit_distance here. I need to check whether a prefix (of any length, although conventionally they're one or two characters in most LLVM programs) exists at the front of both a valid option and the option string being passed in. StringRef::startswith seems like a natural fit for this, as opposed to StringRef::edit_distance.

modocache updated this revision to Diff 128732.Jan 5 2018, 4:34 AM

Remove unnecessary branch for RHS.empty(). Thanks, @jroelofs!

jroelofs added inline comments.Jan 5 2018, 4:36 AM
lib/Option/OptTable.cpp
280

Ohh I see, you are using edit_distance below for sorting the actual option name matches, and this is just a heuristic tweak to give preference to options with the same lead characters as what the user wrote.

modocache marked an inline comment as done.Jan 5 2018, 4:37 AM

+1 Thanks for working on this. Great to see this happening!

jroelofs added inline comments.Jan 5 2018, 4:48 AM
lib/Option/OptTable.cpp
301

I think NormalizedName is also a dangling reference to a temporary. It needs to be a std::string since you're concatenating.

modocache updated this revision to Diff 128737.Jan 5 2018, 5:43 AM

Fixed a dangling reference to concatenated string and re-ran clang-format. Thanks, @jroelofs!

modocache marked an inline comment as done.Jan 5 2018, 5:46 AM
jroelofs accepted this revision.Jan 5 2018, 8:16 AM

LGTM

This revision is now accepted and ready to land.Jan 5 2018, 8:16 AM
modocache closed this revision.Jan 5 2018, 9:11 AM

Great, thanks for the reviews, everyone! Some of you have already commented on https://reviews.llvm.org/D41733, which adds "did you mean?" suggestions to Clang, but let me know if there's anything else I can do on that revision.