This is an archive of the discontinued LLVM Phabricator instance.

[TableGen][AsmMatcherEmitter] Only parse tokens following separators as registers
ClosedPublic

Authored by ab on May 18 2015, 7:24 PM.

Details

Summary

This fixes PR23455, where, when TableGen generates the matcher from the AsmString, it splits "cmp${cc}ss" into tokens, and the "ss" suffix is recognized as the SS register.

I can't think of a situation where that's a feature, not a bug, hence this patch: when a token doesn't follow a separator (and is part of a "word"; wording better than "NewTok" much appreciated), it shouldn't be parsed as a register.

Thoughts?

-Ahmed

Diff Detail

Repository
rL LLVM

Event Timeline

ab updated this revision to Diff 26030.May 18 2015, 7:24 PM
ab retitled this revision from to [TableGen][AsmMatcherEmitter] Only parse tokens following separators as registers.
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added reviewers: tstellarAMD, craig.topper.
ab set the repository for this revision to rL LLVM.
ab added a subscriber: Unknown Object (MLST).
craig.topper edited edge metadata.May 18 2015, 10:23 PM

Does this still allow the problem with a hypothetical string of ss${cc}cmp?

utils/TableGen/AsmMatcherEmitter.cpp
971 ↗(On Diff #26030)

separator

ab updated this revision to Diff 26198.May 20 2015, 5:53 PM
ab edited edge metadata.
ab removed rL LLVM as the repository for this revision.

Does this still allow the problem with a hypothetical string of ss${cc}cmp?

Indeed it does; we can look in both directions, see new patch.

  • Look at all the separators, not just space and comma.
  • Add the PR testcase.

There's a special case for '$$', because Mips uses '$$reg', as '$' is the RegisterPrefix, and the best way of escaping it is '$$'. '\\$reg' will tokenize into separate '$' and 'reg', which isn't OK.
Changing that to not consider '\' a separator is a whole 'nother can of worms, as that leads to behavior changes on:

"\\{$Vd, $dst2, $dst3, $dst4\\}"

where we now think "dst4\}" is a single token.

This seems useful. LGTM.

This revision was automatically updated to reflect the committed changes.