This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
docs/
7/11
HowToBuildWithPGO.rst
-
index.rst
-
utils/
1/1
collect_and_build_with_pgo.py

Differential D53598

Add docs+a script for building clang/LLVM with PGO
ClosedPublic

Authored by george.burgess.iv on Oct 23 2018, 11:58 AM.

Download Raw Diff

Details

Reviewers

rnk
hans

Commits

rGcf477f4e41d5: Add docs+a script for building clang/LLVM with PGO
rL345427: Add docs+a script for building clang/LLVM with PGO

Summary

Depending on who you ask, PGO grants a 15%-25% improvement in build times when using clang. Sadly, hooking everything up properly to generate a profile and apply it to clang isn't always straightforward. This script (and the accompanying docs) aim to make this process easier; ideally, a single invocation of the given script.

In terms of testing, I've got a cronjob on my Debian box that's meant to run this a few times per week, and I tried manually running it on a puny Gentoo box I have (four whole Atom cores!). Nothing obviously broke. ¯\_(ツ)_/¯

Don't really know who to tag for the review of this; IIRC, I chatted with the two of you about it at the dev conf? Other voices appreciated. :)

FWIW, I don't know if we have a Python style guide, so I just shoved this through yapf with all the defaults on. Happy to paint it any color we like, as long as I can do so with a tool.

Finally, though the focus is clang at the moment, the hope is that this is easily applicable to other LLVM-y tools with minimal effort (e.g. lld, opt, ...). Hence, this lives in llvm/utils and tries to be somewhat ambiguous about naming

Diff Detail

Event Timeline

george.burgess.iv created this revision.Oct 23 2018, 11:58 AM

Herald added a subscriber: arphaman. · View Herald TranscriptOct 23 2018, 11:58 AM

I didn't have time to look at the script yet, but I read through the instructions doc. For most users I think maybe that's also the most important part, because I think many folks build quite differently. I will try to get to reading the script tomorrow.

docs/HowToBuildWithPGO.rst
41	Maybe s/test suites/lit tests/? Because I assume it's not building test-suite?
43	Including solid coverage for emitting all kinds of C++ diagnostics, whereas in many user builds none would be emitted. This is just a nit-pick, I'm not sure if there's a better way, but that might be one downside of using these kinds of tests for training.
69	3 and 4 kind of go together? I guess the outputs have to be merged, but the profile really is built while running the benchmarks, or that's how I think about it.
71	nit: s/with/using/ maybe?
80	Cool! I didn't know about this one.
92	But the text says its running the test suite instead?
110	instead of freshly-optimized, maybe say PGO-optimized or something similar to help distinguish the different kinds of clang getting built

aganea added a subscriber: aganea.Oct 25 2018, 8:25 AM

Address feedback

Thank you!

For most users I think maybe that's also the most important part, because I think many folks build quite differently

Agreed. The script is only really made to cover simple builds/cases where users just want a general profile and are happy to pipe that through their own build logic.

If users can't use the script itself, my hope is that --dry-run will serve as a more easily executable form of documentation, and that the script itself ends up being a gentle reminder to update the docs when it breaks. :)

docs/HowToBuildWithPGO.rst
43	Not quite; I think I worded the above badly. We basically do two things to "train" the instrumented clang/llvm: In the instrumented clang/llvm's build directory, run all lit/unit tests In a new build directory, build everything with the instrumented clang/llvm The hope is that "build everything" will strongly bias hot paths toward the common-ish "your code is pretty OK" paths. The other tests may bias some colder branches the wrong way, but I'd imagine that: That's not really an issue in practice, hence "colder" For cases that aren't e.g. building code for the host arch, we'll still get some coverage for what's hot/not, as long as we have the relevant backends enabled. Tried to clarify above a bit. Please let me know if it's still unclear
69	shrug I'm not strongly opinionated, so SGTM :)
92	Tried to address this above as part of my response to your second comment

lgtm

docs/HowToBuildWithPGO.rst
43	Ah, that makes sense. The new text is very clear, thanks.
utils/collect_and_build_with_pgo.py
167	Since you're quoting the command, maybe quote the dir too for consistency, especially in case someone is brave enough to have spaces in it :-)

This revision is now accepted and ready to land.Oct 26 2018, 2:23 AM

Thanks again!

Closed by commit rL345427: Add docs+a script for building clang/LLVM with PGO (authored by • gbiv). · Explain WhyOct 26 2018, 1:58 PM

This revision was automatically updated to reflect the committed changes.

LLVM's CMake has built in support for PGO; see https://llvm.org/docs/AdvancedBuilds.html#multi-stage-pgo. I haven't looked at the script in detail, but does it function similarly?

llvm/trunk/docs/HowToBuildWithPGO.rst
21 ↗	(On Diff #171345)	"the the"

Thanks for pointing that out!

Yeah, someone forwarded that to me off-list yesterday. Apologies for the sorta-duplication here :)

I imagine the cmake support uses the same configuration flags/etc that this script does. When I have some free time, I hope to look more into it and make this script depend on the PGO cmake targets/etc (or, if the differences are tiny, turn this script into a thin wrapper around the cmake bits + something that can provide larger test-cases, since it isn't obvious to me that the cmake bits have preloaded tests beyond "hello world").

In any case, I'll update the docs with whatever I find/do

I notice that you're using LLVM_BUILD_INSTRUMENTED=IR, which corresponds to -fprofile-generate (IR-level profiling), instead of -fprofile-instr-generate (clang-level profiling). Did you play around with both and observe that IR-level profiling gave you better results?

Btw, I tried this out and got a 20% improvement on a self-host with PGO, which is pretty handy :)

I notice that you're using LLVM_BUILD_INSTRUMENTED=IR, which corresponds to -fprofile-generate (IR-level profiling), instead of -fprofile-instr-generate (clang-level profiling). Did you play around with both and observe that IR-level profiling gave you better results?

IR-level profiling gave me better results for unrelated projects in the past, so not initially, but it's probably a good idea.

A quick experiment shows that ninja opt consumes 2.5% fewer user cycles when clang is optimized with an IR-level profile rather than with a frontend one. Building an arbitrary-but-large cpp file (clang/lib/Sema/SemaOverload.cpp) shows a similar 4% win for the IR profile.

Btw, I tried this out and got a 20% improvement on a self-host with PGO, which is pretty handy :)

Woohoo!

Revision Contents

Path

Size

docs/

HowToBuildWithPGO.rst

126 lines

index.rst

4 lines

utils/

collect_and_build_with_pgo.py

486 lines

Diff 170718

docs/HowToBuildWithPGO.rst

This file was added.

				=============================================================
				How To Build Clang and LLVM with Profile-Guided Optimizations
				=============================================================

				Introduction
				============

				PGO (Profile-Guided Optimization) allows your compiler to better optimize code
				for how it actually runs. Users report that applying this to Clang and LLVM can
				decrease overall compile time by 20%.

				This guide walks you through how to use PGO to build Clang.


				Using the script
				================

				We have a script at ``utils/collect_and_build_with_pgo.py``. This script is
				tested on a few Linux flavors, and requires a checkout of LLVM, Clang, and
				compiler-rt. Despite the the name, it performs four clean builds of Clang, so it
				can take a while to run to completion. Please see the script's ``--help`` for
				more information on how to run it. If you want to get the most out of PGO for a
				particular use-case (e.g. compiling a specific large piece of software), please
				do read the section below on 'benchmark' selection.

				Please note that this script is only tested on a few Linux distros. Patches to
				add support for other platforms, as always, are highly appreciated. :)

				This script also supports a ``--dry-run`` option, which causes it to print
				important commands instead of running them.


				Selecting 'benchmarks'
				======================

				PGO does best when the profiles gathered represent how the user plans to use the
				compiler. Notably, highly accurate profiles of llc building x86_64 code aren't
				incredibly helpful if you're going to be targeting ARM.

				The script mentioned above will simply build a release Clang/LLVM for the host
				architecture and run the Clang/LLVM test suites. This should give:
				hansUnsubmitted Done Reply Inline Actions Maybe s/test suites/lit tests/? Because I assume it's not building test-suite? hans: Maybe s/test suites/lit tests/? Because I assume it's not building test-suite?

				- solid coverage of building C++,
				hansUnsubmitted Done Reply Inline Actions Including solid coverage for emitting all kinds of C++ diagnostics, whereas in many user builds none would be emitted. This is just a nit-pick, I'm not sure if there's a better way, but that might be one downside of using these kinds of tests for training. hans: Including solid coverage for emitting all kinds of C++ diagnostics, whereas in many user builds…
				george.burgess.ivAuthorUnsubmitted Not Done Reply Inline Actions Not quite; I think I worded the above badly. We basically do two things to "train" the instrumented clang/llvm: In the instrumented clang/llvm's build directory, run all lit/unit tests In a new build directory, build everything with the instrumented clang/llvm The hope is that "build everything" will strongly bias hot paths toward the common-ish "your code is pretty OK" paths. The other tests may bias some colder branches the wrong way, but I'd imagine that: That's not really an issue in practice, hence "colder" For cases that aren't e.g. building code for the host arch, we'll still get some coverage for what's hot/not, as long as we have the relevant backends enabled. Tried to clarify above a bit. Please let me know if it's still unclear george.burgess.iv: Not quite; I think I worded the above badly. We basically do two things to "train" the…
				hansUnsubmitted Not Done Reply Inline Actions Ah, that makes sense. The new text is very clear, thanks. hans: Ah, that makes sense. The new text is very clear, thanks.
				- good coverage of building C,
				- great coverage of running optimizations,
				- great coverage of the backend for your host's architecture, and
				- some coverage of other architectures (if other arches are supported backends).

				Altogether, this should cover a diverse set of uses for Clang and LLVM. If you
				have very specific needs (e.g. your compiler is meant to compile a large browser
				for four different architectures, or similar), you may want to do something
				else. This is configurable in the script itself.


				Building Clang with PGO
				=======================

				If you prefer to not use the script, this briefly goes over how to build
				Clang/LLVM with PGO.

				First, you should have at least LLVM, Clang, and compiler-rt checked out
				locally.

				Next, at a high level, you're going to need to do the following:

				1. Build a standard Release Clang and the relevant libclang_rt.profile library
				2. Build Clang using the Clang you built above, but with instrumentation
				3. Build your "benchmark" (detailed above) with the second Clang that you built
				4. Generate a profile from the benchmark runs
				hansUnsubmitted Done Reply Inline Actions 3 and 4 kind of go together? I guess the outputs have to be merged, but the profile really is built while running the benchmarks, or that's how I think about it. hans: 3 and 4 kind of go together? I guess the outputs have to be merged, but the profile really is…
				george.burgess.ivAuthorUnsubmitted Not Done Reply Inline Actions shrug I'm not strongly opinionated, so SGTM :) george.burgess.iv: shrug I'm not strongly opinionated, so SGTM :)
				5. Build a final release Clang (along with whatever other binaries you need)
				with the profile collected from your benchmark.
				hansUnsubmitted Done Reply Inline Actions nit: s/with/using/ maybe? hans: nit: s/with/using/ maybe?

				In more detailed steps:

				1. Configure a Clang build as you normally would. It's highly recommended that
				you use the Release configuration for this, since it will be used to build
				another Clang. Because you need Clang and supporting libraries, you'll want
				to build the ``all`` target (e.g. ``ninja all`` or ``make -j4 all``.)
				2. Configure a Clang build as above, but add the following CMake args:
				- ``-DLLVM_BUILD_INSTRUMENTED=IR`` -- This causes us to build everything
				hansUnsubmitted Done Reply Inline Actions Cool! I didn't know about this one. hans: Cool! I didn't know about this one.
				with instrumentation
				- ``-DLLVM_BUILD_RUNTIME=No`` -- A few projects have bad interactions when
				built with profiling profiling, and aren't necessary to build. This flag
				turns them off.
				- ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
				step 1.
				- ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.

				In this build directory, you simply need to build the ``clang`` target (and
				whatever supporting tooling your benchmark requires)
				3. Build your benchmark using the Clang generated in step 2. The 'standard'
				benchmark recommended is to build Clang. So, create yet another build
				hansUnsubmitted Done Reply Inline Actions But the text says its running the test suite instead? hans: But the text says its running the test suite instead?
				george.burgess.ivAuthorUnsubmitted Not Done Reply Inline Actions Tried to address this above as part of my response to your second comment george.burgess.iv: Tried to address this above as part of my response to your second comment
				directory, with the following CMake arguments

				- ``-DCMAKE_C_COMPILER=/path/to/stage2/clang`` - Use the Clang we built in
				step 2.
				- ``-DCMAKE_CXX_COMPILER=/path/to/stage2/clang++`` - Same as above.

				If your users are avid fans of debug info, you may want to consider using the
				``RelWithDebInfo`` target, instead of ``Release``.

				It's recommended to build the ``all`` target for this, since more coverage is
				better coverage.
				4. You should now have a few ``*.profdata`` files in
				``path/to/stage2/profiles/``. You need to merge these using ``llvm-profdata``
				(even if you only have one! The profile merge transforms profraw into actual
				profile data, as well). This can be done with ``path/to/stage1/llvm-profdata
				-merge -output=/path/to/output/profdata.prof
				path/to/stage2/profiles/*.profdata``.
				5. Now, build your final, freshly-optimized Clang. To do this, you'll want to
				hansUnsubmitted Done Reply Inline Actions instead of freshly-optimized, maybe say PGO-optimized or something similar to help distinguish the different kinds of clang getting built hans: instead of freshly-optimized, maybe say PGO-optimized or something similar to help distinguish…
				pass the following additional arguments to CMake.

				- ``-DLLVM_PROFDATA_FILE=/path/to/output/profdata.prof``
				- ``-DCMAKE_C_COMPILER=/path/to/stage1/clang`` - Use the Clang we built in
				step 1.
				- ``-DCMAKE_CXX_COMPILER=/path/to/stage1/clang++`` - Same as above.

				From here, you can build whatever targets you need.

				Please note that you may see warnings about a mismatched profile in the
				build output. These are generally harmless. To silence them, you can add
				``-DCMAKE_C_FLAGS='-Wno-backend-plugin'
				-DCMAKE_CXX_FLAGS='-Wno-backend-plugin'`` to your CMake invocation.

				Congrats! You now have a Clang built with profile-guided optimizations, and you
				can delete all but the final build directory if you'd like.

docs/index.rst

	Show First 20 Lines • Show All 62 Lines • ▼ Show 20 Lines

	.. toctree::			.. toctree::
	:hidden:			:hidden:

	CMake			CMake
	CMakePrimer			CMakePrimer
	AdvancedBuilds			AdvancedBuilds
	HowToBuildOnARM			HowToBuildOnARM
				HowToBuildWithPGO
	HowToCrossCompileBuiltinsOnArm			HowToCrossCompileBuiltinsOnArm
	HowToCrossCompileLLVM			HowToCrossCompileLLVM
	CommandGuide/index			CommandGuide/index
	GettingStarted			GettingStarted
	GettingStartedVS			GettingStartedVS
	FAQ			FAQ
	Lexicon			Lexicon
	HowToAddABuilder			HowToAddABuilder
	Show All 23 Lines

	:doc:`CMake`			:doc:`CMake`
	An addendum to the main Getting Started guide for those using the `CMake			An addendum to the main Getting Started guide for those using the `CMake
	build system <http://www.cmake.org>`_.			build system <http://www.cmake.org>`_.

	:doc:`HowToBuildOnARM`			:doc:`HowToBuildOnARM`
	Notes on building and testing LLVM/Clang on ARM.			Notes on building and testing LLVM/Clang on ARM.

				:doc:`HowToBuildWithPGO`
				Notes on building LLVM/Clang with PGO.

	:doc:`HowToCrossCompileBuiltinsOnArm`			:doc:`HowToCrossCompileBuiltinsOnArm`
	Notes on cross-building and testing the compiler-rt builtins for Arm.			Notes on cross-building and testing the compiler-rt builtins for Arm.

	:doc:`HowToCrossCompileLLVM`			:doc:`HowToCrossCompileLLVM`
	Notes on cross-building and testing LLVM/Clang.			Notes on cross-building and testing LLVM/Clang.

	:doc:`GettingStartedVS`			:doc:`GettingStartedVS`
	An addendum to the main Getting Started guide for those using Visual Studio			An addendum to the main Getting Started guide for those using Visual Studio
	▲ Show 20 Lines • Show All 461 Lines • Show Last 20 Lines

utils/collect_and_build_with_pgo.py

This file was added.

Property	Old Value	New Value
File Mode	null	100755

				#!/usr/bin/env python3
				"""
				This script:
				- Builds clang with user-defined flags
				- Uses that clang to build an instrumented clang, which can be used to collect
				PGO samples
				- Builds a user-defined set of sources (default: clang) to act as a
				"benchmark" to generate a PGO profile
				- Builds clang once more with the PGO profile generated above

				This is a total of four clean builds of clang (by default). This may take a
				while. :)
				"""

				import argparse
				import collections
				import multiprocessing
				import os
				import shlex
				import shutil
				import subprocess
				import sys

				### User configuration


				# If you want to use a different 'benchmark' than building clang, make this
				# function do what you want. out_dir is the build directory for clang, so all
				# of the clang binaries will live under "${out_dir}/bin/". Using clang in
				# ${out_dir} will magically have the profiles go to the right place.
				#
				# You may assume that out_dir is a freshly-built directory that you can reach
				# in to build more things, if you'd like.
				def _run_benchmark(env, out_dir, include_debug_info):
				"""The 'benchmark' we run to generate profile data."""
				target_dir = env.output_subdir('instrumentation_run')

				# `check-llvm` and `check-clang` are cheap ways to increase coverage. The
				# former lets us touch on the non-x86 backends a bit if configured, and the
				# latter gives us more C to chew on (and will send us through diagnostic
				# paths a fair amount, though the `if (stuff_is_broken) { diag() ... }`
				# branches should still heavily be weighted in the not-taken direction,
				# since we built all of LLVM/etc).
				_build_things_in(env, target_dir, what=['check-llvm', 'check-clang'])

				# Building tblgen gets us coverage; don't skip it. (out_dir may also not
				# have them anyway, but that's less of an issue)
				cmake = _get_cmake_invocation_for_bootstrap_from(
				env, out_dir, skip_tablegens=False)

				if include_debug_info:
				cmake.add_flag('CMAKE_BUILD_TYPE', 'RelWithDebInfo')

				_run_fresh_cmake(env, cmake, target_dir)

				# Just build all the things. The more data we have, the better.
				_build_things_in(env, target_dir, what=['all'])


				### Script


				class CmakeInvocation:
				_cflags = ['CMAKE_C_FLAGS', 'CMAKE_CXX_FLAGS']
				_ldflags = [
				'CMAKE_EXE_LINKER_FLAGS',
				'CMAKE_MODULE_LINKER_FLAGS',
				'CMAKE_SHARED_LINKER_FLAGS',
				]

				def __init__(self, cmake, maker, cmake_dir):
				self._prefix = [cmake, '-G', maker, cmake_dir]

				# Map of str -> (list\|str).
				self._flags = {}
				for flag in CmakeInvocation._cflags + CmakeInvocation._ldflags:
				self._flags[flag] = []

				def add_new_flag(self, key, value):
				self.add_flag(key, value, allow_overwrites=False)

				def add_flag(self, key, value, allow_overwrites=True):
				if key not in self._flags:
				self._flags[key] = value
				return

				existing_value = self._flags[key]
				if isinstance(existing_value, list):
				existing_value.append(value)
				return

				if not allow_overwrites:
				raise ValueError('Invalid overwrite of %s requested' % key)

				self._flags[key] = value

				def add_cflags(self, flags):
				# No, I didn't intend to append ['-', 'O', '2'] to my flags, thanks :)
				assert not isinstance(flags, str)
				for f in CmakeInvocation._cflags:
				self._flags[f].extend(flags)

				def add_ldflags(self, flags):
				assert not isinstance(flags, str)
				for f in CmakeInvocation._ldflags:
				self._flags[f].extend(flags)

				def to_args(self):
				args = self._prefix.copy()
				for key, value in sorted(self._flags.items()):
				if isinstance(value, list):
				# We preload all of the list-y values (cflags, ...). If we've
				# nothing to add, don't.
				if not value:
				continue
				value = ' '.join(value)

				arg = '-D' + key
				if value != '':
				arg += '=' + value
				args.append(arg)
				return args


				class Env:
				def __init__(self, llvm_dir, use_make, output_dir, default_cmake_args,
				dry_run):
				self.llvm_dir = llvm_dir
				self.use_make = use_make
				self.output_dir = output_dir
				self.default_cmake_args = default_cmake_args.copy()
				self.dry_run = dry_run

				def get_default_cmake_args_kv(self):
				return self.default_cmake_args.items()

				def get_cmake_maker(self):
				return 'Ninja' if not self.use_make else 'Unix Makefiles'

				def get_make_command(self):
				if self.use_make:
				return ['make', '-j{}'.format(multiprocessing.cpu_count())]
				return ['ninja']

				def output_subdir(self, name):
				return os.path.join(self.output_dir, name)

				def has_llvm_subproject(self, name):
				if name == 'compiler-rt':
				subdir = 'projects/compiler-rt'
				elif name == 'clang':
				subdir = 'tools/clang'
				else:
				raise ValueError('Unknown subproject: %s' % name)

				return os.path.isdir(os.path.join(self.llvm_dir, subdir))

				# Note that we don't allow capturing stdout/stderr. This works quite nicely
				# with dry_run.
				def run_command(self,
				cmd,
				cwd=None,
				check=False,
				silent_unless_error=False):
				cmd_str = ' '.join(shlex.quote(s) for s in cmd)
				print('Running `%s` in %s' % (cmd_str, repr(cwd or os.getcwd())))

				hansUnsubmitted Done Reply Inline Actions Since you're quoting the command, maybe quote the dir too for consistency, especially in case someone is brave enough to have spaces in it :-) hans: Since you're quoting the command, maybe quote the dir too for consistency, especially in case…
				if self.dry_run:
				return

				if silent_unless_error:
				stdout, stderr = subprocess.PIPE, subprocess.STDOUT
				else:
				stdout, stderr = None, None

				# Don't use subprocess.run because it's >= py3.5 only, and it's not too
				# much extra effort to get what it gives us anyway.
				popen = subprocess.Popen(
				cmd,
				stdin=subprocess.DEVNULL,
				stdout=stdout,
				stderr=stderr,
				cwd=cwd)
				stdout, _ = popen.communicate()
				return_code = popen.wait(timeout=0)

				if not return_code:
				return

				if silent_unless_error:
				print(stdout.decode('utf-8', 'ignore'))

				if check:
				raise subprocess.CalledProcessError(
				returncode=return_code, cmd=cmd, output=stdout, stderr=None)


				def _get_default_cmake_invocation(env):
				inv = CmakeInvocation(
				cmake='cmake', maker=env.get_cmake_maker(), cmake_dir=env.llvm_dir)
				for key, value in env.get_default_cmake_args_kv():
				inv.add_new_flag(key, value)
				return inv


				def _get_cmake_invocation_for_bootstrap_from(env, out_dir,
				skip_tablegens=True):
				clang = os.path.join(out_dir, 'bin', 'clang')
				cmake = _get_default_cmake_invocation(env)
				cmake.add_new_flag('CMAKE_C_COMPILER', clang)
				cmake.add_new_flag('CMAKE_CXX_COMPILER', clang + '++')

				# We often get no value out of building new tblgens; the previous build
				# should have them. It's still correct to build them, just slower.
				def add_tablegen(key, binary):
				path = os.path.join(out_dir, 'bin', binary)

				# Check that this exists, since the user's allowed to specify their own
				# stage1 directory (which is generally where we'll source everything
				# from). Dry runs should hope for the best from our user, as well.
				if env.dry_run or os.path.exists(path):
				cmake.add_new_flag(key, path)

				if skip_tablegens:
				add_tablegen('LLVM_TABLEGEN', 'llvm-tblgen')
				add_tablegen('CLANG_TABLEGEN', 'clang-tblgen')

				return cmake


				def _build_things_in(env, target_dir, what):
				cmd = env.get_make_command() + what
				env.run_command(cmd, cwd=target_dir, check=True)


				def _run_fresh_cmake(env, cmake, target_dir):
				if not env.dry_run:
				try:
				shutil.rmtree(target_dir)
				except FileNotFoundError:
				pass

				os.makedirs(target_dir, mode=0o755)

				cmake_args = cmake.to_args()
				env.run_command(
				cmake_args, cwd=target_dir, check=True, silent_unless_error=True)


				def _build_stage1_clang(env):
				target_dir = env.output_subdir('stage1')
				cmake = _get_default_cmake_invocation(env)
				_run_fresh_cmake(env, cmake, target_dir)

				# FIXME: The full build here is somewhat unfortunate. It's primarily
				# because I don't know what to call libclang_rt.profile for arches that
				# aren't x86_64 (and even then, it's in a subdir that contains clang's
				# current version). It would be nice to figure out what target I can
				# request to magically have libclang_rt.profile built for ${host}
				_build_things_in(env, target_dir, what=['all'])
				return target_dir


				def _generate_instrumented_clang_profile(env, stage1_dir, profile_dir,
				output_file):
				llvm_profdata = os.path.join(stage1_dir, 'bin', 'llvm-profdata')
				if env.dry_run:
				profiles = [os.path.join(profile_dir, '*.profraw')]
				else:
				profiles = [
				os.path.join(profile_dir, f) for f in os.listdir(profile_dir)
				if f.endswith('.profraw')
				]
				cmd = [llvm_profdata, 'merge', '-output=' + output_file] + profiles
				env.run_command(cmd, check=True)


				def _build_instrumented_clang(env, stage1_dir):
				assert os.path.isabs(stage1_dir)

				target_dir = os.path.join(env.output_dir, 'instrumented')
				cmake = _get_cmake_invocation_for_bootstrap_from(env, stage1_dir)
				cmake.add_new_flag('LLVM_BUILD_INSTRUMENTED', 'IR')

				# libcxx's configure step messes with our link order: we'll link
				# libclang_rt.profile after libgcc, and the former requires atexit from the
				# latter. So, configure checks fail.
				#
				# Since we don't need libcxx or compiler-rt anyway, just disable them.
				cmake.add_new_flag('LLVM_BUILD_RUNTIME', 'No')

				_run_fresh_cmake(env, cmake, target_dir)
				_build_things_in(env, target_dir, what=['clang', 'lld'])

				profiles_dir = os.path.join(target_dir, 'profiles')
				return target_dir, profiles_dir


				def _build_optimized_clang(env, stage1_dir, profdata_file):
				if not env.dry_run and not os.path.exists(profdata_file):
				raise ValueError('Looks like the profdata file at %s doesn\'t exist' %
				profdata_file)

				target_dir = os.path.join(env.output_dir, 'optimized')
				cmake = _get_cmake_invocation_for_bootstrap_from(env, stage1_dir)
				cmake.add_new_flag('LLVM_PROFDATA_FILE', os.path.abspath(profdata_file))

				# We'll get complaints about hash mismatches in `main` in tools/etc. Ignore
				# it.
				cmake.add_cflags(['-Wno-backend-plugin'])
				_run_fresh_cmake(env, cmake, target_dir)
				_build_things_in(env, target_dir, what=['clang'])
				return target_dir


				Args = collections.namedtuple('Args', [
				'do_optimized_build',
				'include_debug_info',
				'profile_location',
				'stage1_dir',
				])


				def _parse_args():
				parser = argparse.ArgumentParser(
				description='Builds LLVM and Clang with instrumentation, collects '
				'instrumentation profiles for them, and (optionally) builds things'
				'with these PGO profiles. By default, it\'s assumed that you\'re '
				'running this from your LLVM root, and all build artifacts will be '
				'saved to $PWD/out.')
				parser.add_argument(
				'--cmake-extra-arg',
				action='append',
				default=[],
				help='an extra arg to pass to all cmake invocations. Note that this '
				'is interpreted as a -D argument, e.g. --cmake-extra-arg FOO=BAR will '
				'be passed as -DFOO=BAR. This may be specified multiple times.')
				parser.add_argument(
				'--dry-run',
				action='store_true',
				help='print commands instead of running them')
				parser.add_argument(
				'--llvm-dir',
				default='.',
				help='directory containing an LLVM checkout (default: $PWD)')
				parser.add_argument(
				'--no-optimized-build',
				action='store_true',
				help='disable the final, PGO-optimized build')
				parser.add_argument(
				'--out-dir',
				help='directory to write artifacts to (default: $llvm_dir/out)')
				parser.add_argument(
				'--profile-output',
				help='where to output the profile (default is $out/pgo_profile.prof)')
				parser.add_argument(
				'--stage1-dir',
				help='instead of having an initial build of everything, use the given '
				'directory. It is expected that this directory will have clang, '
				'llvm-profdata, and the appropriate libclang_rt.profile already built')
				parser.add_argument(
				'--use-debug-info-in-benchmark',
				action='store_true',
				help='use a regular build instead of RelWithDebInfo in the benchmark. '
				'This increases benchmark execution time and disk space requirements, '
				'but gives more coverage over debuginfo bits in LLVM and clang.')
				parser.add_argument(
				'--use-make',
				action='store_true',
				default=shutil.which('ninja') is None,
				help='use Makefiles instead of ninja')

				args = parser.parse_args()

				llvm_dir = os.path.abspath(args.llvm_dir)
				if args.out_dir is None:
				output_dir = os.path.join(llvm_dir, 'out')
				else:
				output_dir = os.path.abspath(args.out_dir)

				extra_args = {'CMAKE_BUILD_TYPE': 'Release'}
				for arg in args.cmake_extra_arg:
				if arg.startswith('-D'):
				arg = arg[2:]
				elif arg.startswith('-'):
				raise ValueError('Unknown not- -D arg encountered; you may need '
				'to tweak the source...')
				split = arg.split('=', 1)
				if len(split) == 1:
				key, val = split[0], ''
				else:
				key, val = split
				extra_args[key] = val

				env = Env(
				default_cmake_args=extra_args,
				dry_run=args.dry_run,
				llvm_dir=llvm_dir,
				output_dir=output_dir,
				use_make=args.use_make,
				)

				if args.profile_output is not None:
				profile_location = args.profile_output
				else:
				profile_location = os.path.join(env.output_dir, 'pgo_profile.prof')

				result_args = Args(
				do_optimized_build=not args.no_optimized_build,
				include_debug_info=args.use_debug_info_in_benchmark,
				profile_location=profile_location,
				stage1_dir=args.stage1_dir,
				)

				return env, result_args


				def _looks_like_llvm_dir(directory):
				"""Arbitrary set of heuristics to determine if `directory` is an llvm dir.

				Errs on the side of false-positives."""

				contents = set(os.listdir(directory))
				expected_contents = [
				'CODE_OWNERS.TXT',
				'cmake',
				'docs',
				'include',
				'utils',
				]

				if not all(c in contents for c in expected_contents):
				return False

				try:
				include_listing = os.listdir(os.path.join(directory, 'include'))
				except NotADirectoryError:
				return False

				return 'llvm' in include_listing


				def _die(args, *kwargs):
				kwargs['file'] = sys.stderr
				print(args, *kwargs)
				sys.exit(1)


				def _main():
				env, args = _parse_args()

				if not _looks_like_llvm_dir(env.llvm_dir):
				_die('Looks like %s isn\'t an LLVM directory; please see --help' %
				env.llvm_dir)
				if not env.has_llvm_subproject('clang'):
				_die('Need a clang checkout at tools/clang')
				if not env.has_llvm_subproject('compiler-rt'):
				_die('Need a compiler-rt checkout at projects/compiler-rt')

				def status(*args):
				print(*args, file=sys.stderr)

				if args.stage1_dir is None:
				status('*** Building stage1 clang...')
				stage1_out = _build_stage1_clang(env)
				else:
				stage1_out = args.stage1_dir

				status('*** Building instrumented clang...')
				instrumented_out, profile_dir = _build_instrumented_clang(env, stage1_out)
				status('*** Running profdata benchmarks...')
				_run_benchmark(env, instrumented_out, args.include_debug_info)
				status('*** Generating profile...')
				_generate_instrumented_clang_profile(env, stage1_out, profile_dir,
				args.profile_location)

				print('Final profile:', args.profile_location)
				if args.do_optimized_build:
				status('*** Building PGO-optimized binaries...')
				optimized_out = _build_optimized_clang(env, stage1_out,
				args.profile_location)
				print('Final build directory:', optimized_out)


				if __name__ == '__main__':
				_main()