This is an archive of the discontinued LLVM Phabricator instance.

[pseudo] Add benchmarks for pseudoparser.
ClosedPublic

Authored by hokein on May 9 2022, 6:07 AM.

Details

Summary

Running on SemaDecl.cpp with the cxx.bnf grammar:

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
runParseBNFGrammar      649389 ns       649365 ns         1013
runBuildLR            34591903 ns     34591380 ns           20
runPreprocessTokens   11418744 ns     11418703 ns           61 bytes_per_second=63.8971M/s
runGLRParse          282996863 ns    282988726 ns            2 bytes_per_second=2.57827M/s
runParseOverall      294969719 ns    294951870 ns            2 bytes_per_second=2.4737M/s

Diff Detail

Event Timeline

hokein created this revision.May 9 2022, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 6:07 AM
Herald added a subscriber: mgorny. · View Herald Transcript
hokein requested review of this revision.May 9 2022, 6:07 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 6:07 AM
sammccall accepted this revision.May 9 2022, 7:16 AM

Nice! This will be very useful. I think we should have some fixed checked-in examples, but we can add them later.

BTW, the fuzzer identifies slow inputs, I have attached one (though it looks like x::x::x::x::... might be enough.

clang-tools-extra/pseudo/benchmarks/Benchmark.cpp
8

Can you add a brief description and usage note here? (realistic, not BENCHMARK_OPTIONS)
In particular something useful for benchmarking the parser overall, which will be more important long-term than specific operations like grammar compilation.

(It'd be nice if we could make overall parsing the *default* benchmark action, but I'm not sure if that's easy)

41

I'd avoid helper functions for the content of the benchmark loop, because it makes it less obvious exactly what you're benchmarking.

in particular in this case, I think Diags should be written outside the loop (it shouldn't matter in practice if there are no diags, but it's subtle)

50

rather than have each benchmark read the inputs and compile the grammar, it seems a bit less repetitive to do this in main() or a single setup function called from there, and share the const inputs.

The only real downside I see is that some of the benchmarks don't require source, but source text could default to an empty string.

56

runBuildLSR?

65

if the benchmark is "runGLRParse" then everything apart from the glrParse call should be outside the test loop I think. (and such a benchmark is useful)

If the benchmark is intended to be end-to-end (also useful) then it should have a more generic name, and we should expect to include disambiguation, syntax-tree forming etc in the future.

I think we should probably have both!

77

you need to call chooseConditionalBranches before stripping, otherwise no branches will be taken which is not realistic for real code

99

instead of manipulating argv by hand, maybe first call benchmark::initialize and then call llvm:🆑:ParseCommandLineOptions?

See llvm/utils/unittest/UnitTestMain/TestMain.cpp

(Hmm, maybe I should have looked into this for the fuzzer)

This revision is now accepted and ready to land.May 9 2022, 7:16 AM
hokein updated this revision to Diff 428323.May 10 2022, 2:57 AM
hokein marked 6 inline comments as done.

address comments

hokein added inline comments.May 10 2022, 2:58 AM
clang-tools-extra/pseudo/benchmarks/Benchmark.cpp
50

moved to a common setup which is invoked in the main function.

65

yeah, my intent is to run an overall parse. Renamed to` runParseOverral`, and added runGLRParse for the specific glrParse benchmark.

hokein edited the summary of this revision. (Show Details)May 10 2022, 2:59 AM

I think we should have some fixed checked-in examples, but we can add them later.

Yeah, it is a good idea to have some fixed datasets checked in (but we will not run them in buildbots).

BTW, the fuzzer identifies slow inputs, I have attached one (though it looks like x::x::x::x::... might be enough.

Thanks, this is really useful and interesting (I was struggling to find out such an example).

This revision was automatically updated to reflect the committed changes.