This is an archive of the discontinued LLVM Phabricator instance.

Cleanup identifier parsing.
ClosedPublic

Authored by cor3ntin on Aug 18 2021, 10:39 AM.

Details

Summary
  • Rename methods to clearly signal when they only deal with ASCII
  • Simplify the parsing of identifier
  • use start/continue instead of head/body for consistency with

Unicode terminology

Diff Detail

Event Timeline

cor3ntin created this revision.Aug 18 2021, 10:39 AM
cor3ntin requested review of this revision.Aug 18 2021, 10:39 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 18 2021, 10:39 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
cor3ntin retitled this revision from Cleanup identifier parsing. to [WIP] Cleanup identifier parsing..Aug 18 2021, 10:44 AM
cor3ntin removed projects: Restricted Project, Restricted Project.
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptAug 18 2021, 10:44 AM
cor3ntin updated this revision to Diff 367322.Aug 18 2021, 2:11 PM

Spell ASCII in upper case

Herald added a project: Restricted Project. · View Herald TranscriptAug 18 2021, 2:11 PM
cor3ntin updated this revision to Diff 367325.Aug 18 2021, 2:42 PM

Looks better in lower case after all

cor3ntin updated this revision to Diff 367326.Aug 18 2021, 2:44 PM

Remove file committed accidentally

@aaron.ballman Let me know what you think.
The PR does not contain new behavior, only renames and refactor the function lexing identifiers. I ran the build a few times and did not measure performance differences on my system. The code should behave exactly the same except with one loop instead of 3.
I also moved the 2 identifier lexing functions near one another to make it easier to understand.

This makes it apparent that some places in tools, maybe header names or module parsing too only check for ASCII identifiers when they may want to check for Unicode, This is not addressed here.

aaron.ballman added a subscriber: rsmith.

In general, I'm in favor of these changes. They help identify (pun *totally* intended) where we're improperly expecting ASCII identifiers in places, which can hopefully be addressed in follow-up work. @rsmith, do you have any concerns with this direction?

Can you remove the [WIP] from the title so it's clear that this is no longer in progress? Also, I'd recommend slapping an NFC in the title somewhere to make it clear there's no functional changes intended.

clang/include/clang/Lex/Lexer.h
702

Should this be LexUnicodeIdentifierContinue()? If so, perhaps it can also be moved up to line 578 so it's near the "start" function?

Or does this function handle both Unicode and ASCII identifiers? If so, the comments could probably be updated.

clang/lib/Lex/Lexer.cpp
1758

Is the comment here still accurate? Might be worth rewriting in prose rather than regex?

1762
1767
1783–1788
1789
cor3ntin added inline comments.Aug 25 2021, 12:25 PM
clang/include/clang/Lex/Lexer.h
702

This handles all identifiers - after the first codepoint has been parsed - Which comment are you referring to?

clang/lib/Lex/Lexer.cpp
1758

I don't think the comment was accurate before, I'll find somehing better!

aaron.ballman added inline comments.Aug 25 2021, 12:27 PM
clang/include/clang/Lex/Lexer.h
701

Something like this then?

cor3ntin updated this revision to Diff 368746.Aug 25 2021, 2:53 PM

Fix comments following Aaron's feedback, remove
braces deemed unecessary by the guidelines

cor3ntin marked 4 inline comments as done.Aug 25 2021, 2:53 PM
cor3ntin added inline comments.
clang/include/clang/Lex/Lexer.h
702

I kept the comment as is - because it applies to all function underneath, but added a comment in the definition in the cpp

cor3ntin retitled this revision from [WIP] Cleanup identifier parsing. to Cleanup identifier parsing..Aug 25 2021, 2:54 PM
aaron.ballman accepted this revision.Aug 30 2021, 5:15 AM

The changes LGTM, modulo a commenting request. Let's wait a bit before landing in case @rsmith has concerns.

clang/include/clang/Lex/Lexer.h
702

It'd be a bit better to add the comments here; the header file is where users often first find some functionality, so that's where the comments explaining the high-level functionality of how to call the function are most useful. Also, IDEs often pick up comments "attached to" declarations and display them automatically for users which is a nice benefit.

This revision is now accepted and ready to land.Aug 30 2021, 5:15 AM
cor3ntin updated this revision to Diff 369446.Aug 30 2021, 7:57 AM

Move a comment to the header file per Aaron request

aaron.ballman closed this revision.Sep 14 2021, 6:13 AM

Thank you for the cleanup! I've landed in 601102d282d5e9a1429fea52ee17303aec8a7c10.