This is an archive of the discontinued LLVM Phabricator instance.

[MC] AsmLexer: add extensible identifier's character set support.
Needs RevisionPublic

Authored by vpykhtin on Feb 16 2016, 8:16 AM.

Details

Summary

Working on AMDGPU project I need to support assembler identifiers started with '&'. Looking into AsmLexer.cpp I found similar need to optionally support '@' inside identifiers and decided this is time to add generic support for identifier charset. I added configurable bitvector set for prefix and body identifier's characters.

Event Timeline

vpykhtin updated this revision to Diff 48072.Feb 16 2016, 8:16 AM
vpykhtin retitled this revision from to [MC] AsmLexer: 30% speedup on tests, added extensible identifier's character set support..
vpykhtin updated this object.
vpykhtin added reviewers: grosbach, arsenm, ddunbar.
vpykhtin set the repository for this revision to rL LLVM.
vpykhtin added a project: Restricted Project.
vpykhtin added a subscriber: nhaustov.

Kind reminder if someone can take a look at this.

rafael added inline comments.
include/llvm/MC/MCParser/MCAsmLexer.h
238

Why do you need to make these virtual?

243

is..Contains is a strange name since it has two verbs.

lib/MC/MCParser/AsmLexer.cpp
33

Why?

vpykhtin added inline comments.Mar 1 2016, 11:34 AM
include/llvm/MC/MCParser/MCAsmLexer.h
238

Well not making it virtual would require bitvector sets to be part of this class. I'm not objecting though as it already done with SkipSpace and AllowAtInIdentifier.

243

What would be a better name here?

lib/MC/MCParser/AsmLexer.cpp
33

Well it based on my previuos experience on Windows where we had lexer using these routines eating up to 10% of scan time. Probably not so "generally" as I stated though. I'm not insisting on this particular change and can remove it.

ddunbar resigned from this revision.Sep 1 2016, 8:26 PM
ddunbar removed a reviewer: ddunbar.
vpykhtin updated this revision to Diff 78513.Nov 18 2016, 6:24 AM
vpykhtin retitled this revision from [MC] AsmLexer: 30% speedup on tests, added extensible identifier's character set support. to [MC] AsmLexer: add extensible identifier's character set support..
vpykhtin updated this object.

After a loooong time I would like to reanimate this review requiest.

Previously I incorrectly measured performance impact for this patch and obtained 30% performance gain - this result was incorrect. Current measurement on a large .s file shows no affect on parsing performance.

grosbach requested changes to this revision.Dec 9 2016, 2:58 PM
grosbach edited edge metadata.
grosbach added inline comments.
include/llvm/MC/MCParser/MCAsmLexer.h
238

With the generalization, these can go away entirely, yes? Replace the callsites w/ the new API.

246

This should start with "is" not "Is" according the the coding guidelines.

251

Ditto.

258

This feels really weird. Wouldn't any callsites want to be using one of the other two? They'll know their context. I don't see any invocations of this method in the patch. Why is it needed at all?

lib/MC/MCParser/AsmLexer.cpp
576

Can you elaborate on this bit? Not sure I follow why this is so much more logic than previously.

lib/MC/MCParser/MCAsmLexer.cpp
37

Given the bimodal behaviour based on Value, this should probably just be two functions.

This revision now requires changes to proceed.Dec 9 2016, 2:58 PM
arsenm resigned from this revision.Apr 5 2020, 8:26 AM