This is an archive of the discontinued LLVM Phabricator instance.

[AsmParser] Mnemonic Spell Corrector
ClosedPublic

Authored by SjoerdMeijer on May 12 2017, 6:53 AM.

Details

Summary

This implements a spell corrector to suggest another mnemonic when an invalid one is specified, for example:

$ echo "adXd r1,r2,#3" | llvm-mc -triple arm
<stdin>:1:1: error: invalid instruction, did you mean: add, qadd?
adXd r1,r2,#3
^

It is target agnostic, but as a first step I have added it only to the ARM backend.

Diff Detail

Repository
rL LLVM

Event Timeline

SjoerdMeijer created this revision.May 12 2017, 6:53 AM
rengolin edited edge metadata.May 18 2017, 6:34 AM

Hi Sjoerd,

I think this is a nice addition, and if we can trim all the duplication/public issues, the extra computation makes no difference, since this is going to run only when the compiler is on its way out.

The things I'd change from this, you probably already know, but just for completeness sake: :)

  • Add the check functions to a cpp file, probably an already existing one that handles asm errors
  • Keep things private, and if needed, maybe even duplicate the info (computation is not a problem here)
  • Show what change you'd need to do to the original table (maybe a new column?), so we can see what would be the cost
  • As you said, some tests would be nice

The only thing I worry here is that instructions rarely behave like text. So the Levenshtein distance may work for "addx -> add" but I'm expecting a lot of bad proposals, maybe enough to make the whole exercise worse than not having it.

This could be really nice if expanded to understand the operands, for example, and restrict to a subset (say, only opcodes that can use quad-regs), or something. In that case, the number of false positives would be lower and the quality of the proposals much better. But obviously, this wouldn't fit this patch.

So, if we have a plan to really make this target agnostic, but maybe with some target-hooks for the special filters, then I think it's a good idea. If the plan is to just use the Levenshtein distance, then it may be too much hassle.

Makes sense?

cheers,
--renato

Hi Renato,

Makes perfect sense, and thanks for looking into this. I will first start with making and adding tests to see if the suggestions are accurate. I've obviously tried it on a number of cases and the suggestions were spot on, but perhaps I've missed something. The outcome of this exercise will then show whether this is all worth it or not.

Cheers,
Sjoerd.

I am now reusing the already existing tablegen'ed MnemonicTable; so no need to introduce another one. That's the reason I've moved the algorithm to the tablegen'ed files, because that's where definitions/datastructures live.

Algorithmically I have made a few tweaks:

  • it now returns a list of candidates/suggestions (rather than arbitrarily just the first one)
  • to avoid giving useless suggestions, I have restricted to only consider candidates that are 2 edits away.

The added regression test should give a good impression what to expect from this patch and the given suggestions.

This looks quite useful. One suggestion I have after playing around with this patch is to restrict the suggestions based on the available feature bits as that narrows the list of possibilities quite nicely.

Thanks! I've added the check for the available feature bits.
Also fixed the regression test that had some wrong tags.

Added a test case and a fix for checking supported architecture features.

Nits inlined. Also you should update the summary.

Thanks,
Simon

test/MC/ARM/invalid-instructions.s
3 ↗(On Diff #103005)

Add one line comment explaining that this is testing the mnemonic spell checker.

46 ↗(On Diff #103005)

resuling -> resulting

utils/TableGen/AsmMatcherEmitter.cpp
2718 ↗(On Diff #103005)

Add a string documenting this assertion such as "Empty string for assembly mnemoic!"

2765 ↗(On Diff #103005)

"Edit distance.." -> "Candidates with an edit distance.."

SjoerdMeijer edited the summary of this revision. (Show Details)

Thanks Simon! I've addressed your comments. Also, I renamed the regression test to invalid-instructions-spellcheck.s just to be even more explicit about the purpose of the test.

sdardis edited edge metadata.Jun 19 2017, 7:01 AM

Sorry for missing this earlier, but the <Target>Operand class holds the mnemonic as a StringRef, and the matcher table ->getMnemonic() also returns a StringRef. Happily enough that class already has a edit_distance function: http://llvm.org/docs/doxygen/html/classllvm_1_1StringRef.html#a532f28c842d3728dc8f7e3c40949b33f

Thoughts on using that instead? It would avoid unnecessary std::string construction.

Thanks,
Simon

Oh wow, nice find! Completely missed that too. I am adding this, and that will simplify things a lot!

SjoerdMeijer edited the summary of this revision. (Show Details)

Now using StringRef and its edit_distance function.

sdardis accepted this revision.Jun 19 2017, 10:14 AM

LGTM. We should see if @rengolin has any further comments.

utils/TableGen/AsmMatcherEmitter.cpp
2726 ↗(On Diff #103060)

Space after the '//'.

2732 ↗(On Diff #103060)

Space after the if.

This revision is now accepted and ready to land.Jun 19 2017, 10:14 AM

Hi @rengolin, Simon had some great suggestions: using the architecture features to narrow down the number of candidates, and the use of the existing stringref edit distance function, which makes this now a fairly straightforward patch. Is it okay to commit this (with the last nits fixed)? Cheers.

rengolin accepted this revision.Jul 5 2017, 3:43 AM

Hi Sjoerd,

Sorry for the delay. Yes, this looks simple enough and quite useful. LGTM.

Thanks!
--renato

This revision was automatically updated to reflect the committed changes.