Diff 100007

docs/Benchmarking.rst

This file was added.

				==================================
				Benchmarking tips
				==================================

				kristof.beylsUnsubmitted Not Done Reply Inline Actions I think that at the moment, these are mainly tips specifically for how to set up a system to reduce noisiness in benchmark results. Therefore, I'm wondering if the title should be a bit more specific than "Benchmarking tips". But I also can't immediately come up with a title that's much better... kristof.beyls: I think that at the moment, these are mainly tips specifically for how to set up a system to…

				Introduction
				============

				For benchmarking a patch we want to reduce all possible sources of
				noise as much as possible. How to do that is very OS dependent.
				kristof.beylsUnsubmitted Not Done Reply Inline Actions nitpick: s/to be in control of all the possible sources of noise/reduce all possible sources of noise as much as possible/? I think that's slightly better, but don't feel strongly about this change. kristof.beyls: nitpick: s/to be in control of all the possible sources of noise/reduce all possible sources of…

				Note that low noise is required, but not sufficient. It does not
				exclude measurement bias. See
				https://www.cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf for
				example.

				General
				emasteUnsubmitted Not Done Reply Inline Actions Should we have a "General" section with this note? Certainly this is true for FreeBSD as well, but it seems like it will apply everywhere. The advice could also be in a general section, with just the specific commands specific to the OS perhaps. For example, the tmpfs explanation also applies to FreeBSD, and even on other operating systems the general advice is sound. (Actually, the command is also identical on FreeBSD for tmpfs, but probably not elsewhere.) emaste: Should we have a "General" section with this note? Certainly this is true for FreeBSD as well…
				================================

				* Use a high resolution timer, e.g. perf under linux.

				* Run the benchmark multiple times to be able to recognize noise.

				* Disable as many processes or services as possible on the target system.

				* Disable frequency scaling, turbo boost and address space
				kristof.beylsUnsubmitted Not Done Reply Inline Actions At http://lnt.llvm.org/quickstart.html#running-tests, there are a few hints under number 3 on how to reduce noise and cope with the remaining noise better when you use LNT for benchmarking. Would it be useful to have a pointer here to there? Maybe some of the hints there might also make sense to have here? Or maybe some of them should just be moved here, and there can be a pointer from there to here? kristof.beyls: At http://lnt.llvm.org/quickstart.html#running-tests, there are a few hints under number 3 on…
				randomization (see OS specific section).

				* Static link if the OS supports it. That avoids any variation that
				might be introduced by loading dynamic libraries. This can be done
				by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake.

				kristof.beylsUnsubmitted Not Done Reply Inline Actions Results will be less noisy with address space randomization turned off, but I continue to think that incorrectly skews experiment results. If you recommend this, maybe you also need to recommend to align all functions to a reasonably large offset so that small code changes in one function have less of a probability of affecting code layout and the associated performance impact in another function? kristof.beyls: Results will be less noisy with address space randomization turned off, but I continue to think…
				* Try to avoid storage. On some systems you can use tmpfs. Putting the
				program, inputs and outputs on tmpfs avoids touching a real storage
				system, which can have a pretty big variability.

				To mount it (on linux and freebsd at least)::

				mount -t tmpfs -o size=<XX>g none dir_to_mount

				Linux
				================================

				* Disable address space randomization::

				echo 0 > /proc/sys/kernel/randomize_va_space

				* Disable turbo mode::
				inouehrsUnsubmitted Not Done Reply Inline Actions Do you mean SMT (e.g. HyperThreding in Intel)? If you mean CPUs in the same socket, `core_siblings_list` can be used (but stopping sibling SMT threads is more important.) inouehrs: Do you mean SMT (e.g. HyperThreding in Intel)? If you mean CPUs in the same socket…

				echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo

				kristof.beylsUnsubmitted Not Done Reply Inline Actions This one seems linux x86 specific. Maybe worthwhile to move the linux-x86 specific ones into a sub-section of linux? kristof.beyls: This one seems linux x86 specific. Maybe worthwhile to move the linux-x86 specific ones into a…
				* Set scaling_governor to performance::

				for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
				do
				echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor
				inouehrsUnsubmitted Not Done Reply Inline Actions Maybe it is nice to mention about `numactl` command in case for benchmarking parallel programs. inouehrs: Maybe it is nice to mention about `numactl` command in case for benchmarking parallel programs.
				done

				inouehrsUnsubmitted Not Done Reply Inline Actions Simple explanation on the meaning or -r option will be valuable, i.e. -r 10 will increase the execution time 10x. inouehrs: Simple explanation on the meaning or -r option will be valuable, i.e. -r 10 will increase the…
				* Use https://github.com/lpechacek/cpuset to reserve cpus for just the
				program you are benchmarking. If using perf, leave at least 2 cores
				so that perf runs in one and your program in another::

				cset shield -c N1,N2 -k on

				This will move all threads out of N1 and N2. The ``-k on`` means
				that even kernel threads are moved out.

				* Disable the SMT pair of the cpus you will use for the benchmark. The
				pair of cpu N can be found in
				``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and
				disabled with::

				echo 0 > /sys/devices/system/cpu/cpuX/online


				* Run the program with::

				cset shield --exec -- perf stat -r 10 <cmd>

				This will run the command after ``--`` in the isolated cpus. The
				particular perf command runs the ``<cmd>`` 10 times and reports
				statistics.

				With these in place you can expect perf variations of less than 0.1%.

docs/index.rst

Show First 20 Lines • Show All 84 Lines • ▼ Show 20 Lines	.. toctree::
Passes		Passes
YamlIO		YamlIO
GetElementPtr		GetElementPtr
Frontend/PerformanceTips		Frontend/PerformanceTips
MCJITDesignAndImplementation		MCJITDesignAndImplementation
CodeOfConduct		CodeOfConduct
CompileCudaWithLLVM		CompileCudaWithLLVM
ReportingGuide		ReportingGuide
		Benchmarking

:doc:`GettingStarted`		:doc:`GettingStarted`
Discusses how to get up and running quickly with the LLVM infrastructure.		Discusses how to get up and running quickly with the LLVM infrastructure.
Everything from unpacking and compilation of the distribution to execution		Everything from unpacking and compilation of the distribution to execution
of some tools.		of some tools.

:doc:`CMake`		:doc:`CMake`
An addendum to the main Getting Started guide for those using the `CMake		An addendum to the main Getting Started guide for those using the `CMake
▲ Show 20 Lines • Show All 443 Lines • Show Last 20 Lines

This is an archive of the discontinued LLVM Phabricator instance.

Add some tips on how to benchhmark
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 100007

docs/Benchmarking.rst

docs/index.rst

This is an archive of the discontinued LLVM Phabricator instance.

Add some tips on how to benchhmarkClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 100007

docs/Benchmarking.rst

docs/index.rst

Add some tips on how to benchhmark
ClosedPublic