This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
clang/utils/analyzer/
-
utils/
-
analyzer/
1/2
ProjectMap.py
-
SATest.py
-
projects/
-
projects.json

Differential D83942

[analyzer][tests] Add a notion of project sizes
ClosedPublic

Authored by vsavchenko on Jul 16 2020, 5:11 AM.

Download Raw Diff

Details

Reviewers

NoQ
xazax.hun
Szelethus
dcoughlin

Commits

rGaec12c1264ac: [analyzer][tests] Add a notion of project sizes

Summary

Whith the number of projects growing, it is important to be able to
filter them in a more convenient way than by names. It is especially
important for benchmarks, when it is not viable to analyze big
projects 20 or 50 times in a row.

Because of this reason, this commit adds a notion of sizes and a
filtering interface that puts a limit on a maximum size of the project
to analyze or benchmark.

Sizes assigned to the projects in this commit, do not directly
correspond to the number of lines or files in the project. The key
factor that is important for the developers of the analyzer is the
time it takes to analyze the project. And for this very reason,
"size" basically helps to cluster projects based on their analysis
time.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vsavchenko created this revision.Jul 16 2020, 5:11 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 16 2020, 5:11 AM

Herald added subscribers: cfe-commits, ASDenysPetrov, Charusso and 7 others. · View Herald Transcript

Harbormaster failed remote builds in B64496: Diff 278438!Jul 16 2020, 5:51 AM

I don't speak snek, but I approve this message!

Its also great that we have dedicated tiny/small projects to analyze locally.

clang/utils/analyzer/ProjectMap.py
31	Just an observation rather then a suggestion, its interesting that we don't have a MEDIUM size in between SMALL and BIG. I think the TINY category describes the sub-minute runs well, and it'd be awkward to introduce a project in between SMALL and BIG, so I don't immediately see a time interval we need to categorize.

I suppose LLVM could be a HUGE project?

In D83942#2155762, @Szelethus wrote:

I don't speak snek, but I approve this message!

Thanks 😊

I suppose LLVM could be a HUGE project?

Yurp, LLVM, pytorch, (surprisingly!) CMake

clang/utils/analyzer/ProjectMap.py
31	Yeah, maybe it's good to add MEDIUM with 10min-30min

Sizes assigned to the projects in this commit, do not directly
correspond to the number of lines or files in the project.

Maybe QUICK/NORMAL/SLOW then? Or by purpose: BENCHMARK/DAILY/PARANOID?

In D83942#2156320, @NoQ wrote:

Sizes assigned to the projects in this commit, do not directly
correspond to the number of lines or files in the project.

Maybe QUICK/NORMAL/SLOW then? Or by purpose: BENCHMARK/DAILY/PARANOID?

I am not against the idea of naming it differently, but then we need to get all whereabouts vocabulary straight. What is the noun? Speed? And what should be the command line option?

Make --projects and --max-size compatible

Harbormaster completed remote builds in B65195: Diff 279738.Jul 22 2020, 3:12 AM

Keep SATest.py Python2 compatible

Herald added a subscriber: steakhal. · View Herald TranscriptAug 6 2020, 2:56 AM

Harbormaster completed remote builds in B67286: Diff 283548.Aug 6 2020, 3:06 AM

I can't say any of my ideas about better naming are exceptionally bright. The existing scale seems perfectly reasonable.

This revision is now accepted and ready to land.Aug 7 2020, 4:57 PM

vsavchenko mentioned this in D86295: [analyzer] Reorder the layout of MemRegion and cache by hand for optimal size.Aug 24 2020, 3:53 AM

This revision was landed with ongoing or failed builds.Aug 24 2020, 6:13 AM

Closed by commit rGaec12c1264ac: [analyzer][tests] Add a notion of project sizes (authored by vsavchenko). · Explain Why

This revision was automatically updated to reflect the committed changes.

vsavchenko added a commit: rGaec12c1264ac: [analyzer][tests] Add a notion of project sizes.

Revision Contents

Path

Size

clang/

utils/

analyzer/

ProjectMap.py

64 lines

SATest.py

34 lines

projects/

projects.json

60 lines

Diff 279738

clang/utils/analyzer/ProjectMap.py

import json		import json
import os		import os

from enum import Enum		from enum import auto, Enum
from typing import Any, Dict, List, NamedTuple, Optional, Tuple		from typing import Any, Dict, List, NamedTuple, Optional, Tuple


JSON = Dict[str, Any]		JSON = Dict[str, Any]


DEFAULT_MAP_FILE = "projects.json"		DEFAULT_MAP_FILE = "projects.json"


class DownloadType(str, Enum):		class DownloadType(str, Enum):
GIT = "git"		GIT = "git"
ZIP = "zip"		ZIP = "zip"
SCRIPT = "script"		SCRIPT = "script"


		class Size(int, Enum):
		"""
		Size of the project.

		Sizes do not directly correspond to the number of lines or files in the
		project. The key factor that is important for the developers of the
		analyzer is the time it takes to analyze the project. Here is how
		the following sizes map to times:

		TINY: <1min
		SMALL: 1min-10min
		BIG: 10min-1h
		SzelethusUnsubmitted Not Done Reply Inline Actions Just an observation rather then a suggestion, its interesting that we don't have a MEDIUM size in between SMALL and BIG. I think the TINY category describes the sub-minute runs well, and it'd be awkward to introduce a project in between SMALL and BIG, so I don't immediately see a time interval we need to categorize. Szelethus: Just an observation rather then a suggestion, its interesting that we don't have a MEDIUM size…
		vsavchenkoAuthorUnsubmitted Done Reply Inline Actions Yeah, maybe it's good to add MEDIUM with 10min-30min vsavchenko: Yeah, maybe it's good to add MEDIUM with 10min-30min
		HUGE: >1h

		The borders are a bit of a blur, especially because analysis time varies
		from one machine to another. However, the relative times will stay pretty
		similar, and these groupings will still be helpful.

		UNSPECIFIED is a very special case, which is intentionally last in the list
		of possible sizes. If the user wants to filter projects by one of the
		possible sizes, we want projects with UNSPECIFIED size to be filtered out
		for any given size.
		"""
		TINY = auto()
		SMALL = auto()
		BIG = auto()
		HUGE = auto()
		UNSPECIFIED = auto()

		@staticmethod
		def from_str(raw_size: Optional[str]) -> "Size":
		"""
		Construct a Size object from an optional string.

		:param raw_size: optional string representation of the desired Size
		object. None will produce UNSPECIFIED size.

		This method is case-insensitive, so raw sizes 'tiny', 'TINY', and
		'TiNy' will produce the same result.
		"""
		if raw_size is None:
		return Size.UNSPECIFIED

		raw_size_upper = raw_size.upper()
		# The implementation is decoupled from the actual values of the enum,
		# so we can easily add or modify it without bothering about this
		# function.
		for possible_size in Size:
		if possible_size.name == raw_size_upper:
		return possible_size

		possible_sizes = [size.name.lower() for size in Size
		# no need in showing our users this size
		if size != Size.UNSPECIFIED]
		raise ValueError(f"Incorrect project size '{raw_size}'. "
		f"Available sizes are {possible_sizes}")


class ProjectInfo(NamedTuple):		class ProjectInfo(NamedTuple):
"""		"""
Information about a project to analyze.		Information about a project to analyze.
"""		"""
name: str		name: str
mode: int		mode: int
source: DownloadType = DownloadType.SCRIPT		source: DownloadType = DownloadType.SCRIPT
origin: str = ""		origin: str = ""
commit: str = ""		commit: str = ""
enabled: bool = True		enabled: bool = True
		size: Size = Size.UNSPECIFIED

def with_fields(self, **kwargs) -> "ProjectInfo":		def with_fields(self, **kwargs) -> "ProjectInfo":
"""		"""
Create a copy of this project info with customized fields.		Create a copy of this project info with customized fields.
NamedTuple is immutable and this is a way to create modified copies.		NamedTuple is immutable and this is a way to create modified copies.

info.enabled = True		info.enabled = True
info.mode = 1		info.mode = 1
▲ Show 20 Lines • Show All 55 Lines • ▼ Show 20 Lines	class ProjectMap:

@staticmethod		@staticmethod
def _parse_project(raw_project: JSON) -> ProjectInfo:		def _parse_project(raw_project: JSON) -> ProjectInfo:
try:		try:
name: str = raw_project["name"]		name: str = raw_project["name"]
build_mode: int = raw_project["mode"]		build_mode: int = raw_project["mode"]
enabled: bool = raw_project.get("enabled", True)		enabled: bool = raw_project.get("enabled", True)
source: DownloadType = raw_project.get("source", "zip")		source: DownloadType = raw_project.get("source", "zip")
		size = Size.from_str(raw_project.get("size", None))

if source == DownloadType.GIT:		if source == DownloadType.GIT:
origin, commit = ProjectMap._get_git_params(raw_project)		origin, commit = ProjectMap._get_git_params(raw_project)
else:		else:
origin, commit = "", ""		origin, commit = "", ""

return ProjectInfo(name, build_mode, source, origin, commit,		return ProjectInfo(name, build_mode, source, origin, commit,
enabled)		enabled, size)

except KeyError as e:		except KeyError as e:
raise ValueError(		raise ValueError(
f"Project info is required to have a '{e.args[0]}' field")		f"Project info is required to have a '{e.args[0]}' field")

@staticmethod		@staticmethod
def _get_git_params(raw_project: JSON) -> Tuple[str, str]:		def _get_git_params(raw_project: JSON) -> Tuple[str, str]:
try:		try:
Show All 32 Lines

clang/utils/analyzer/SATest.py

Show All 31 Lines	def add(parser, args):
SATestAdd.add_new_project(project)		SATestAdd.add_new_project(project)


def build(parser, args):		def build(parser, args):
import SATestBuild		import SATestBuild

SATestBuild.VERBOSE = args.verbose		SATestBuild.VERBOSE = args.verbose

projects = get_projects(parser, args.projects)		projects = get_projects(parser, args)
tester = SATestBuild.RegressionTester(args.jobs,		tester = SATestBuild.RegressionTester(args.jobs,
projects,		projects,
args.override_compiler,		args.override_compiler,
args.extra_analyzer_config,		args.extra_analyzer_config,
args.regenerate,		args.regenerate,
args.strictness)		args.strictness)
tests_passed = tester.test_all()		tests_passed = tester.test_all()

Show All 30 Lines	def update(parser, args):
project_map = ProjectMap()		project_map = ProjectMap()
for project in project_map.projects:		for project in project_map.projects:
SATestUpdateDiffs.update_reference_results(project)		SATestUpdateDiffs.update_reference_results(project)


def benchmark(parser, args):		def benchmark(parser, args):
from SATestBenchmark import Benchmark		from SATestBenchmark import Benchmark

projects = get_projects(parser, args.projects)		projects = get_projects(parser, args)
benchmark = Benchmark(projects, args.iterations, args.output)		benchmark = Benchmark(projects, args.iterations, args.output)
benchmark.run()		benchmark.run()


def benchmark_compare(parser, args):		def benchmark_compare(parser, args):
import SATestBenchmark		import SATestBenchmark
SATestBenchmark.compare(args.old, args.new, args.output)		SATestBenchmark.compare(args.old, args.new, args.output)


def get_projects(parser, projects_str):		def get_projects(parser, args):
from ProjectMap import ProjectMap		from ProjectMap import ProjectMap, Size

project_map = ProjectMap()		project_map = ProjectMap()
projects = project_map.projects		projects = project_map.projects

if projects_str:		def filter_projects(projects, predicate, force=False):
projects_arg = projects_str.split(",")		return [project.with_fields(enabled=(force or project.enabled) and
		predicate(project))
		for project in projects]

		if args.projects:
		projects_arg = args.projects.split(",")
available_projects = [project.name		available_projects = [project.name
for project in projects]		for project in projects]

# validate that given projects are present in the project map file		# validate that given projects are present in the project map file
for manual_project in projects_arg:		for manual_project in projects_arg:
if manual_project not in available_projects:		if manual_project not in available_projects:
parser.error("Project '{project}' is not found in "		parser.error("Project '{project}' is not found in "
"the project map file. Available projects are "		"the project map file. Available projects are "
"{all}.".format(project=manual_project,		"{all}.".format(project=manual_project,
all=available_projects))		all=available_projects))

projects = [project.with_fields(enabled=project.name in projects_arg)		projects = filter_projects(projects, lambda project:
for project in projects]		project.name in projects_arg,
		force=True)

		try:
		max_size = Size.from_str(args.max_size)
		except ValueError as e:
		parser.error(f"{e}")

		projects = filter_projects(projects, lambda project:
		project.size <= max_size)

return projects		return projects


def docker(parser, args):		def docker(parser, args):
if len(args.rest) > 0:		if len(args.rest) > 0:
if args.rest[0] != "--":		if args.rest[0] != "--":
parser.error("REST arguments should start with '--'")		parser.error("REST arguments should start with '--'")
▲ Show 20 Lines • Show All 107 Lines • ▼ Show 20 Lines	build_parser.add_argument("-j", "--jobs", dest="jobs",
type=int, default=0,		type=int, default=0,
help="Number of projects to test concurrently")		help="Number of projects to test concurrently")
build_parser.add_argument("--extra-analyzer-config",		build_parser.add_argument("--extra-analyzer-config",
dest="extra_analyzer_config", type=str,		dest="extra_analyzer_config", type=str,
default="",		default="",
help="Arguments passed to to -analyzer-config")		help="Arguments passed to to -analyzer-config")
build_parser.add_argument("--projects", action="store", default="",		build_parser.add_argument("--projects", action="store", default="",
help="Comma-separated list of projects to test")		help="Comma-separated list of projects to test")
		build_parser.add_argument("--max-size", action="store", default=None,
		help="Maximum size for the projects to test")
build_parser.add_argument("-v", "--verbose", action="count", default=0)		build_parser.add_argument("-v", "--verbose", action="count", default=0)
build_parser.set_defaults(func=build)		build_parser.set_defaults(func=build)

# compare subcommand		# compare subcommand
cmp_parser = subparsers.add_parser(		cmp_parser = subparsers.add_parser(
"compare",		"compare",
help="Comparing two static analyzer runs in terms of "		help="Comparing two static analyzer runs in terms of "
"reported warnings and execution time statistics.")		"reported warnings and execution time statistics.")
▲ Show 20 Lines • Show All 63 Lines • ▼ Show 20 Lines	bench_parser.add_argument("-i", "--iterations", action="store",
type=int, default=20,		type=int, default=20,
help="Number of iterations for building each "		help="Number of iterations for building each "
"project.")		"project.")
bench_parser.add_argument("-o", "--output", action="store",		bench_parser.add_argument("-o", "--output", action="store",
default="benchmark.csv",		default="benchmark.csv",
help="Output csv file for the benchmark results")		help="Output csv file for the benchmark results")
bench_parser.add_argument("--projects", action="store", default="",		bench_parser.add_argument("--projects", action="store", default="",
help="Comma-separated list of projects to test")		help="Comma-separated list of projects to test")
		bench_parser.add_argument("--max-size", action="store", default=None,
		help="Maximum size for the projects to test")
bench_parser.set_defaults(func=benchmark)		bench_parser.set_defaults(func=benchmark)

bench_subparsers = bench_parser.add_subparsers()		bench_subparsers = bench_parser.add_subparsers()
bench_compare_parser = bench_subparsers.add_parser(		bench_compare_parser = bench_subparsers.add_parser(
"compare",		"compare",
help="Compare benchmark runs.")		help="Compare benchmark runs.")
bench_compare_parser.add_argument("--old", action="store", required=True,		bench_compare_parser.add_argument("--old", action="store", required=True,
help="Benchmark reference results to "		help="Benchmark reference results to "
Show All 14 Lines

clang/utils/analyzer/projects/projects.json

	[			[
	{			{
	"name": "cxxopts",			"name": "cxxopts",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/jarro2783/cxxopts.git",			"origin": "https://github.com/jarro2783/cxxopts.git",
	"commit": "794c975"			"commit": "794c975",
				"size": "tiny"
	},			},
	{			{
	"name": "box2d",			"name": "box2d",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/erincatto/box2d.git",			"origin": "https://github.com/erincatto/box2d.git",
	"commit": "1025f9a"			"commit": "1025f9a",
				"size": "small"
	},			},
	{			{
	"name": "tinyexpr",			"name": "tinyexpr",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/codeplea/tinyexpr.git",			"origin": "https://github.com/codeplea/tinyexpr.git",
	"commit": "ffb0d41"			"commit": "ffb0d41",
				"size": "tiny"
	},			},
	{			{
	"name": "symengine",			"name": "symengine",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/symengine/symengine.git",			"origin": "https://github.com/symengine/symengine.git",
	"commit": "4f669d59"			"commit": "4f669d59",
				"size": "small"
	},			},
	{			{
	"name": "termbox",			"name": "termbox",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/nsf/termbox.git",			"origin": "https://github.com/nsf/termbox.git",
	"commit": "0df1355"			"commit": "0df1355",
				"size": "tiny"
	},			},
	{			{
	"name": "tinyvm",			"name": "tinyvm",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/jakogut/tinyvm.git",			"origin": "https://github.com/jakogut/tinyvm.git",
	"commit": "10c25d8"			"commit": "10c25d8",
				"size": "tiny"
	},			},
	{			{
	"name": "tinyspline",			"name": "tinyspline",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/msteinbeck/tinyspline.git",			"origin": "https://github.com/msteinbeck/tinyspline.git",
	"commit": "f8b1ab7"			"commit": "f8b1ab7",
				"size": "tiny"
	},			},
	{			{
	"name": "oatpp",			"name": "oatpp",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/oatpp/oatpp.git",			"origin": "https://github.com/oatpp/oatpp.git",
	"commit": "d3e60fb"			"commit": "d3e60fb",
				"size": "small"
	},			},
	{			{
	"name": "libsoundio",			"name": "libsoundio",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/andrewrk/libsoundio.git",			"origin": "https://github.com/andrewrk/libsoundio.git",
	"commit": "b810bf2"			"commit": "b810bf2",
				"size": "tiny"
	},			},
	{			{
	"name": "zstd",			"name": "zstd",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/facebook/zstd.git",			"origin": "https://github.com/facebook/zstd.git",
	"commit": "2af4e073"			"commit": "2af4e073",
				"size": "small"
	},			},
	{			{
	"name": "simbody",			"name": "simbody",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/simbody/simbody.git",			"origin": "https://github.com/simbody/simbody.git",
	"commit": "5cf513d"			"commit": "5cf513d",
				"size": "big"
	},			},
	{			{
	"name": "duckdb",			"name": "duckdb",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/cwida/duckdb.git",			"origin": "https://github.com/cwida/duckdb.git",
	"commit": "d098c9f"			"commit": "d098c9f",
				"size": "big"
	},			},
	{			{
	"name": "drogon",			"name": "drogon",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/an-tao/drogon.git",			"origin": "https://github.com/an-tao/drogon.git",
	"commit": "fd2a612"			"commit": "fd2a612",
				"size": "small"
	},			},
	{			{
	"name": "fmt",			"name": "fmt",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/fmtlib/fmt.git",			"origin": "https://github.com/fmtlib/fmt.git",
	"commit": "5e7c70e"			"commit": "5e7c70e",
				"size": "small"
	},			},
	{			{
	"name": "re2",			"name": "re2",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/google/re2.git",			"origin": "https://github.com/google/re2.git",
	"commit": "2b25567"			"commit": "2b25567",
				"size": "small"
	},			},
	{			{
	"name": "cppcheck",			"name": "cppcheck",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/danmar/cppcheck.git",			"origin": "https://github.com/danmar/cppcheck.git",
	"commit": "5fa3d53"			"commit": "5fa3d53",
				"size": "small"
	},			},
	{			{
	"name": "harfbuzz",			"name": "harfbuzz",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/harfbuzz/harfbuzz.git",			"origin": "https://github.com/harfbuzz/harfbuzz.git",
	"commit": "f8d345e"			"commit": "f8d345e",
				"size": "small"
	},			},
	{			{
	"name": "capnproto",			"name": "capnproto",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/capnproto/capnproto.git",			"origin": "https://github.com/capnproto/capnproto.git",
	"commit": "8be1c9f"			"commit": "8be1c9f",
				"size": "small"
	},			},
	{			{
	"name": "tmux",			"name": "tmux",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/tmux/tmux.git",			"origin": "https://github.com/tmux/tmux.git",
	"commit": "a5f99e1"			"commit": "a5f99e1",
				"size": "big"
	},			},
	{			{
	"name": "faiss",			"name": "faiss",
	"mode": 1,			"mode": 1,
	"source": "git",			"source": "git",
	"origin": "https://github.com/facebookresearch/faiss.git",			"origin": "https://github.com/facebookresearch/faiss.git",
	"commit": "9e5d5b7"			"commit": "9e5d5b7",
				"size": "small"
	}			}
	]			]