This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang-tools-extra/pseudo/
-
pseudo/
-
CMakeLists.txt
-
benchmarks/
8/9
Benchmark.cpp
-
CMakeLists.txt

Differential D125226

[pseudo] Add benchmarks for pseudoparser.
ClosedPublic

Authored by hokein on May 9 2022, 6:07 AM.

Download Raw Diff

Details

Reviewers

sammccall

Commits

rGbe895d5768d5: [pseudo] Add benchmarks for pseudoparser.

Summary

Running on SemaDecl.cpp with the cxx.bnf grammar:

--------------------------------------------------------------
Benchmark                    Time             CPU   Iterations
--------------------------------------------------------------
runParseBNFGrammar      649389 ns       649365 ns         1013
runBuildLR            34591903 ns     34591380 ns           20
runPreprocessTokens   11418744 ns     11418703 ns           61 bytes_per_second=63.8971M/s
runGLRParse          282996863 ns    282988726 ns            2 bytes_per_second=2.57827M/s
runParseOverall      294969719 ns    294951870 ns            2 bytes_per_second=2.4737M/s

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

hokein created this revision.May 9 2022, 6:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 6:07 AM

Herald added a subscriber: mgorny. · View Herald Transcript

hokein requested review of this revision.May 9 2022, 6:07 AM

Herald added a project: Restricted Project. · View Herald TranscriptMay 9 2022, 6:07 AM

Herald added a subscriber: alextsao1999. · View Herald Transcript

Harbormaster completed remote builds in B163468: Diff 428056.May 9 2022, 6:19 AM

Nice! This will be very useful. I think we should have some fixed checked-in examples, but we can add them later.

BTW, the fuzzer identifies slow inputs, I have attached one (though it looks like x::x::x::x::... might be enough.

slow-unit-f2ba0885392b8fd32237e004e9c3be20175a1f86774 BDownload

clang-tools-extra/pseudo/benchmarks/Benchmark.cpp
8	Can you add a brief description and usage note here? (realistic, not BENCHMARK_OPTIONS) In particular something useful for benchmarking the parser overall, which will be more important long-term than specific operations like grammar compilation. (It'd be nice if we could make overall parsing the default benchmark action, but I'm not sure if that's easy)
41	I'd avoid helper functions for the content of the benchmark loop, because it makes it less obvious exactly what you're benchmarking. in particular in this case, I think Diags should be written outside the loop (it shouldn't matter in practice if there are no diags, but it's subtle)
50	rather than have each benchmark read the inputs and compile the grammar, it seems a bit less repetitive to do this in main() or a single setup function called from there, and share the const inputs. The only real downside I see is that some of the benchmarks don't require source, but source text could default to an empty string.
56	runBuildLSR?
65	if the benchmark is "runGLRParse" then everything apart from the glrParse call should be outside the test loop I think. (and such a benchmark is useful) If the benchmark is intended to be end-to-end (also useful) then it should have a more generic name, and we should expect to include disambiguation, syntax-tree forming etc in the future. I think we should probably have both!
77	you need to call chooseConditionalBranches before stripping, otherwise no branches will be taken which is not realistic for real code
99	instead of manipulating argv by hand, maybe first call benchmark::initialize and then call llvm:🆑:ParseCommandLineOptions? See llvm/utils/unittest/UnitTestMain/TestMain.cpp (Hmm, maybe I should have looked into this for the fuzzer)

This revision is now accepted and ready to land.May 9 2022, 7:16 AM

address comments

hokein added inline comments.May 10 2022, 2:58 AM

clang-tools-extra/pseudo/benchmarks/Benchmark.cpp
50	moved to a common setup which is invoked in the main function.
65	yeah, my intent is to run an overall parse. Renamed to` runParseOverral`, and added `runGLRParse` for the specific `glrParse` benchmark.

hokein edited the summary of this revision. (Show Details)May 10 2022, 2:59 AM

Harbormaster completed remote builds in B163656: Diff 428323.May 10 2022, 3:08 AM

In D125226#3500691, @sammccall wrote:

I think we should have some fixed checked-in examples, but we can add them later.

Yeah, it is a good idea to have some fixed datasets checked in (but we will not run them in buildbots).

BTW, the fuzzer identifies slow inputs, I have attached one (though it looks like x::x::x::x::... might be enough.
slow-unit-f2ba0885392b8fd32237e004e9c3be20175a1f86774 BDownload

Thanks, this is really useful and interesting (I was struggling to find out such an example).

Closed by commit rGbe895d5768d5: [pseudo] Add benchmarks for pseudoparser. (authored by hokein). · Explain WhyMay 10 2022, 5:16 AM

This revision was automatically updated to reflect the committed changes.

hokein added a commit: rGbe895d5768d5: [pseudo] Add benchmarks for pseudoparser..

Revision Contents

Path

Size

clang-tools-extra/

pseudo/

CMakeLists.txt

1 line

benchmarks/

Benchmark.cpp

139 lines

CMakeLists.txt

7 lines

Diff 428348

clang-tools-extra/pseudo/CMakeLists.txt

	include_directories(include)			include_directories(include)
	include_directories(${CMAKE_CURRENT_BINARY_DIR}/include)			include_directories(${CMAKE_CURRENT_BINARY_DIR}/include)
	add_subdirectory(lib)			add_subdirectory(lib)
	add_subdirectory(tool)			add_subdirectory(tool)
	add_subdirectory(fuzzer)			add_subdirectory(fuzzer)
				add_subdirectory(benchmarks)
	if(CLANG_INCLUDE_TESTS)			if(CLANG_INCLUDE_TESTS)
	add_subdirectory(unittests)			add_subdirectory(unittests)
	add_subdirectory(test)			add_subdirectory(test)
	endif()			endif()

clang-tools-extra/pseudo/benchmarks/Benchmark.cpp

This file was added.

				//===--- Benchmark.cpp - clang pseudoparser benchmarks ---------- C++ --===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				sammccallUnsubmitted Done Reply Inline Actions Can you add a brief description and usage note here? (realistic, not BENCHMARK_OPTIONS) In particular something useful for benchmarking the parser overall, which will be more important long-term than specific operations like grammar compilation. (It'd be nice if we could make overall parsing the default benchmark action, but I'm not sure if that's easy) sammccall: Can you add a brief description and usage note here? (realistic, not BENCHMARK_OPTIONS) In…
				// Benchmark for the overall pseudoparser performance, it also includes other
				// important pieces of the pseudoparser (grammar compliation, LR table build
				// etc).
				//
				// Note: make sure we build it in Relase mode.
				//
				// Usage:
				// tools/clang/tools/extra/pseudo/benchmarks/ClangPseudoBenchmark \
				// --grammar=/path/to/cxx.bnf --source=/patch/to/source-to-parse.cpp \
				// --benchmark_filter=runParseOverall
				//
				//===----------------------------------------------------------------------===//

				#include "benchmark/benchmark.h"
				#include "clang-pseudo/DirectiveTree.h"
				#include "clang-pseudo/Forest.h"
				#include "clang-pseudo/GLR.h"
				#include "clang-pseudo/Grammar.h"
				#include "clang-pseudo/LRTable.h"
				#include "clang-pseudo/Token.h"
				#include "clang/Basic/LangOptions.h"
				#include "llvm/ADT/StringRef.h"
				#include "llvm/Support/CommandLine.h"
				#include "llvm/Support/ErrorOr.h"
				#include "llvm/Support/MemoryBuffer.h"
				#include "llvm/Support/raw_ostream.h"
				#include <string>

				using llvm::cl::desc;
				using llvm::cl::init;
				using llvm::cl::opt;

				static opt<std::string> GrammarFile("grammar",
				sammccallUnsubmitted Done Reply Inline Actions I'd avoid helper functions for the content of the benchmark loop, because it makes it less obvious exactly what you're benchmarking. in particular in this case, I think Diags should be written outside the loop (it shouldn't matter in practice if there are no diags, but it's subtle) sammccall: I'd avoid helper functions for the content of the benchmark loop, because it makes it less…
				desc("Parse and check a BNF grammar file."),
				init(""));
				static opt<std::string> Source("source", desc("Source file"));

				namespace clang {
				namespace pseudo {
				namespace {

				const std::string *GrammarText = nullptr;
				sammccallUnsubmitted Done Reply Inline Actions rather than have each benchmark read the inputs and compile the grammar, it seems a bit less repetitive to do this in main() or a single setup function called from there, and share the const inputs. The only real downside I see is that some of the benchmarks don't require source, but source text could default to an empty string. sammccall: rather than have each benchmark read the inputs and compile the grammar, it seems a bit less…
				hokeinAuthorUnsubmitted Done Reply Inline Actions moved to a common setup which is invoked in the main function. hokein: moved to a common setup which is invoked in the main function.
				const std::string *SourceText = nullptr;
				const Grammar *G = nullptr;

				void setupGrammarAndSource() {
				auto ReadFile = [](llvm::StringRef FilePath) -> std::string {
				llvm::ErrorOr<std::unique_ptr<llvm::MemoryBuffer>> GrammarText =
				sammccallUnsubmitted Done Reply Inline Actions runBuildLSR? sammccall: runBuildLSR?
				llvm::MemoryBuffer::getFile(FilePath);
				if (std::error_code EC = GrammarText.getError()) {
				llvm::errs() << "Error: can't read file '" << FilePath
				<< "': " << EC.message() << "\n";
				std::exit(1);
				}
				return GrammarText.get()->getBuffer().str();
				};
				GrammarText = new std::string(ReadFile(GrammarFile));
				sammccallUnsubmitted Not Done Reply Inline Actions if the benchmark is "runGLRParse" then everything apart from the glrParse call should be outside the test loop I think. (and such a benchmark is useful) If the benchmark is intended to be end-to-end (also useful) then it should have a more generic name, and we should expect to include disambiguation, syntax-tree forming etc in the future. I think we should probably have both! sammccall: if the benchmark is "runGLRParse" then everything apart from the glrParse call should be…
				hokeinAuthorUnsubmitted Done Reply Inline Actions yeah, my intent is to run an overall parse. Renamed to` runParseOverral`, and added `runGLRParse` for the specific `glrParse` benchmark. hokein: yeah, my intent is to run an overall parse. Renamed to` runParseOverral`, and added…
				SourceText = new std::string(ReadFile(Source));
				std::vector<std::string> Diags;
				G = Grammar::parseBNF(*GrammarText, Diags).release();
				}

				static void runParseBNFGrammar(benchmark::State &State) {
				std::vector<std::string> Diags;
				for (auto _ : State)
				Grammar::parseBNF(*GrammarText, Diags);
				}
				BENCHMARK(runParseBNFGrammar);

				sammccallUnsubmitted Done Reply Inline Actions you need to call chooseConditionalBranches before stripping, otherwise no branches will be taken which is not realistic for real code sammccall: you need to call chooseConditionalBranches before stripping, otherwise no branches will be…
				static void runBuildLR(benchmark::State &State) {
				for (auto _ : State)
				clang::pseudo::LRTable::buildSLR(*G);
				}
				BENCHMARK(runBuildLR);

				TokenStream parseableTokenStream() {
				clang::LangOptions LangOpts = genericLangOpts();
				TokenStream RawStream = clang::pseudo::lex(*SourceText, LangOpts);
				auto DirectiveStructure = DirectiveTree::parse(RawStream);
				clang::pseudo::chooseConditionalBranches(DirectiveStructure, RawStream);
				TokenStream Cook =
				cook(DirectiveStructure.stripDirectives(RawStream), LangOpts);
				return clang::pseudo::stripComments(Cook);
				}

				static void runPreprocessTokens(benchmark::State &State) {
				for (auto _ : State)
				parseableTokenStream();
				State.SetBytesProcessed(static_cast<uint64_t>(State.iterations()) *
				SourceText->size());
				}
				sammccallUnsubmitted Done Reply Inline Actions instead of manipulating argv by hand, maybe first call benchmark::initialize and then call llvm:🆑:ParseCommandLineOptions? See llvm/utils/unittest/UnitTestMain/TestMain.cpp (Hmm, maybe I should have looked into this for the fuzzer) sammccall: instead of manipulating argv by hand, maybe first call benchmark::initialize and then call llvm…
				BENCHMARK(runPreprocessTokens);

				static void runGLRParse(benchmark::State &State) {
				clang::LangOptions LangOpts = genericLangOpts();
				LRTable Table = clang::pseudo::LRTable::buildSLR(*G);
				TokenStream ParseableStream = parseableTokenStream();
				for (auto _ : State) {
				pseudo::ForestArena Forest;
				pseudo::GSS GSS;
				glrParse(ParseableStream, ParseParams{*G, Table, Forest, GSS});
				}
				State.SetBytesProcessed(static_cast<uint64_t>(State.iterations()) *
				SourceText->size());
				}
				BENCHMARK(runGLRParse);

				static void runParseOverall(benchmark::State &State) {
				clang::LangOptions LangOpts = genericLangOpts();
				LRTable Table = clang::pseudo::LRTable::buildSLR(*G);
				for (auto _ : State) {
				pseudo::ForestArena Forest;
				pseudo::GSS GSS;
				glrParse(parseableTokenStream(), ParseParams{*G, Table, Forest, GSS});
				}
				State.SetBytesProcessed(static_cast<uint64_t>(State.iterations()) *
				SourceText->size());
				}
				BENCHMARK(runParseOverall);

				} // namespace
				} // namespace pseudo
				} // namespace clang

				int main(int argc, char *argv[]) {
				benchmark::Initialize(&argc, argv);
				llvm::cl::ParseCommandLineOptions(argc, argv);
				clang::pseudo::setupGrammarAndSource();
				benchmark::RunSpecifiedBenchmarks();
				return 0;
				}

clang-tools-extra/pseudo/benchmarks/CMakeLists.txt

This file was added.

				add_benchmark(ClangPseudoBenchmark Benchmark.cpp)

				target_link_libraries(ClangPseudoBenchmark
				PRIVATE
				clangPseudo
				LLVMSupport
				)