This is an archive of the discontinued LLVM Phabricator instance.

[analyzer] Ignore flex generated files
ClosedPublic

Authored by steakhal on Nov 24 2021, 12:14 AM.

Details

Summary

Some projects [1,2,3] have flex-generated files besides bison-generated ones.
Unfortunately, the comment "/* A lexical scanner generated by flex */"
generated by the tools is not necessarily at the beginning of the file,
thus we need to quickly skim through the file for this needle string.

Luckily, StringRef can do this operation in an efficient way.

That being said, now the bison comment is not required to be at the very
beginning of the file. This allows us to detect a couple more cases
[4,5,6].

Alternatively, we could say that we only allow whitespace characters
before matching the bison/flex header comment. That would prevent the
(probably) unnecessary string search in the buffer. However, I could not
verify that these tools would actually respect this assumption.

Additionally to this, e.g. the Twin project [1] has other non-whitespace
characters (some preprocessor directives) before the flex-generated
header comment. So the heuristic in the previous paragraph won't work with that.
Thus, I would advocate the current implementation.

According to my measurement, this patch won't introduce measurable performance degradation, even though we will do 2 linear scans.

I introduce the ignore-bison-generated-files and
ignore-flex-generated-files to disable skipping these files.
Both of these options are true by default.

[1]: https://github.com/cosmos72/twin/blob/master/server/rcparse_lex.cpp#L7
[2]: https://github.com/marcauresoar/make-examples/blob/22362cdcf9dd7c597b5049ce7f176621e2e9ac7a/sandbox/count-words/lexer.c#L6
[3]: https://github.com/vladcmanea/2nd-faculty-year-Formal-Languages---Automata-assignments/blob/11abdf64629d9eb741438ba69f04636769d5a374/lab1/lex.yy.c#L6

[4]: https://github.com/KritikaChoudhary/System-Software-Lab/blob/47f5b2cfe2a2738fd54eae9f8439817f6a22034e/B_yacc/1/y1.tab.h#L2
[5]: https://github.com/VirtualMonitor/VirtualMonitor/blob/71d1bf9b1e7b392a7bd0c73dc217138dc5865651/src/VBox/Additions/x11/x11include/xorg-server-1.8.0/parser.h#L2
[6]: https://github.com/bspaulding/DrawTest/blob/3f773ceb13de14275429036b9cbc5aa19e29bab9/Framework/OpenEars.framework/Versions/A/Headers/jsgf_parser.h#L2

Diff Detail

Event Timeline

steakhal created this revision.Nov 24 2021, 12:14 AM
steakhal requested review of this revision.Nov 24 2021, 12:14 AM
steakhal edited the summary of this revision. (Show Details)Nov 24 2021, 12:19 AM

I wonder if this feature should be part of CodeChecker/scan-build instead (potentially under an option).

steakhal updated this revision to Diff 390010.Nov 26 2021, 4:36 AM
steakhal edited the summary of this revision. (Show Details)

Added analyzer options disabling the skipping of bison and flex files.

xazax.hun accepted this revision.Nov 29 2021, 12:57 PM

Since we already had some precedent, it looks good to me. But I still think it would be nicer to have these features outside of the analyzer in the future.

This revision is now accepted and ready to land.Nov 29 2021, 12:57 PM

Since we already had some precedent, it looks good to me. But I still think it would be nicer to have these features outside of the analyzer in the future.

Ah, now I get it.
That would mean that I have to patch this in scan-build and CodeChecker individually. Makes sense. Let me think about it.

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptDec 6 2021, 1:20 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript