This is an archive of the discontinued LLVM Phabricator instance.

clang-format: [JS] detect C++ keywords.
ClosedPublic

Authored by mprobst on Apr 2 2020, 7:55 AM.

Details

Summary

C++ defines a number of keywords that are regular identifiers in
JavaScript, e.g. concept:

const concept = 1; // legit JS

This change expands the existing IsJavaScriptIdentifier(Tok) function
to return false for C++ keywords that aren't keywords in JS.

Diff Detail

Event Timeline

mprobst created this revision.Apr 2 2020, 7:55 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 2 2020, 7:55 AM
jfb added a comment.Apr 2 2020, 8:38 AM

Some of these are technically "future reserved keywords": https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#Keywords

Wouldn't it be better to list all of JS's keywords at this point?

krasimir added a comment.EditedApr 2 2020, 10:04 AM
In D77311#1957367, @jfb wrote:

Some of these are technically "future reserved keywords": https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#Keywords

Wouldn't it be better to list all of JS's keywords at this point?

+1 to consider listing all of JS's keywords instead of listing all C++ keywords that are not JS keywords. With the current approach, any time a new C++ keyword token gets introduced, the switch in IsJavaScriptIdentifier would have to be updated.

But turning the implementation into something that lists JS keywords seems a bit tricky. Just a suggestion: maybe we can use the trick to #include TokenKinds.def used in TokenKinds.h:

bool IsJavaScriptIdentifier(const FormatToken &Tok) const {
switch (Tok.Tok.getKind()) {
  // list all JS keywords defined as KEYWORD in TokenKinds.def
  case tok::kw_break:
  case tok::kw_case:
  ... 
  case tok::kw_return:
  return false;

  // All of the remaining C keywords are JS identifiers.
  #define KEYWORD(X,Y) case tok::kw_ ## X:
  #include "clang/Basic/TokenKinds.def"
  // #undef KEYWORD is not needed -- it's #undef-ed at the end of TokenKinds.def
  return true;

  // handle identifiers and everything else as in the patch.
  case tok::identifier:
  default: return false;
}
krasimir requested changes to this revision.Apr 3 2020, 2:58 AM
This revision now requires changes to proceed.Apr 3 2020, 2:58 AM
mprobst updated this revision to Diff 254740.Apr 3 2020, 3:39 AM
  • - improve handling of keywors - rather than blacklisting all C++

Thanks for the feedback. Indeed, whitelisting the JS keywords is better, but we still needed a way to blacklist all C++ keywords then - Krasimir's suggestion with the #include did the trick, thanks.

mprobst updated this revision to Diff 254741.Apr 3 2020, 3:42 AM
  • fix formatting
MyDeveloperDay added a project: Restricted Project.Apr 3 2020, 3:47 AM
MyDeveloperDay accepted this revision.Apr 3 2020, 4:13 AM
MyDeveloperDay added a subscriber: MyDeveloperDay.

I think this LGTM and brings more clarity as its much easier to look for things in the positive than the negative

krasimir accepted this revision.Apr 3 2020, 4:15 AM

Thank you!

This revision is now accepted and ready to land.Apr 3 2020, 4:15 AM
Harbormaster completed remote builds in B51606: Diff 254741.
krasimir added inline comments.Apr 3 2020, 4:17 AM
clang/lib/Format/FormatToken.h
913–970
mprobst marked 2 inline comments as done.Apr 3 2020, 5:03 AM
mprobst added inline comments.
clang/lib/Format/FormatToken.h
913–970

Done, though I used the TS list of keywords (as it's a superset of the JS keywords).

This revision was automatically updated to reflect the committed changes.
mprobst marked an inline comment as done.