|
| 1 | +================================== |
| 2 | +Benchmarking tips |
| 3 | +================================== |
| 4 | + |
| 5 | + |
| 6 | +Introduction |
| 7 | +============ |
| 8 | + |
| 9 | +For benchmarking a patch we want to reduce all possible sources of |
| 10 | +noise as much as possible. How to do that is very OS dependent. |
| 11 | + |
| 12 | +Note that low noise is required, but not sufficient. It does not |
| 13 | +exclude measurement bias. See |
| 14 | +https://www.cis.upenn.edu/~cis501/papers/producing-wrong-data.pdf for |
| 15 | +example. |
| 16 | + |
| 17 | +General |
| 18 | +================================ |
| 19 | + |
| 20 | +* Use a high resolution timer, e.g. perf under linux. |
| 21 | + |
| 22 | +* Run the benchmark multiple times to be able to recognize noise. |
| 23 | + |
| 24 | +* Disable as many processes or services as possible on the target system. |
| 25 | + |
| 26 | +* Disable frequency scaling, turbo boost and address space |
| 27 | + randomization (see OS specific section). |
| 28 | + |
| 29 | +* Static link if the OS supports it. That avoids any variation that |
| 30 | + might be introduced by loading dynamic libraries. This can be done |
| 31 | + by passing ``-DLLVM_BUILD_STATIC=ON`` to cmake. |
| 32 | + |
| 33 | +* Try to avoid storage. On some systems you can use tmpfs. Putting the |
| 34 | + program, inputs and outputs on tmpfs avoids touching a real storage |
| 35 | + system, which can have a pretty big variability. |
| 36 | + |
| 37 | + To mount it (on linux and freebsd at least):: |
| 38 | + |
| 39 | + mount -t tmpfs -o size=<XX>g none dir_to_mount |
| 40 | + |
| 41 | +Linux |
| 42 | +===== |
| 43 | + |
| 44 | +* Disable address space randomization:: |
| 45 | + |
| 46 | + echo 0 > /proc/sys/kernel/randomize_va_space |
| 47 | + |
| 48 | +* Set scaling_governor to performance:: |
| 49 | + |
| 50 | + for i in /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor |
| 51 | + do |
| 52 | + echo performance > /sys/devices/system/cpu/cpu*/cpufreq/scaling_governor |
| 53 | + done |
| 54 | + |
| 55 | +* Use https://github.com/lpechacek/cpuset to reserve cpus for just the |
| 56 | + program you are benchmarking. If using perf, leave at least 2 cores |
| 57 | + so that perf runs in one and your program in another:: |
| 58 | + |
| 59 | + cset shield -c N1,N2 -k on |
| 60 | + |
| 61 | + This will move all threads out of N1 and N2. The ``-k on`` means |
| 62 | + that even kernel threads are moved out. |
| 63 | + |
| 64 | +* Disable the SMT pair of the cpus you will use for the benchmark. The |
| 65 | + pair of cpu N can be found in |
| 66 | + ``/sys/devices/system/cpu/cpuN/topology/thread_siblings_list`` and |
| 67 | + disabled with:: |
| 68 | + |
| 69 | + echo 0 > /sys/devices/system/cpu/cpuX/online |
| 70 | + |
| 71 | + |
| 72 | +* Run the program with:: |
| 73 | + |
| 74 | + cset shield --exec -- perf stat -r 10 <cmd> |
| 75 | + |
| 76 | + This will run the command after ``--`` in the isolated cpus. The |
| 77 | + particular perf command runs the ``<cmd>`` 10 times and reports |
| 78 | + statistics. |
| 79 | + |
| 80 | +With these in place you can expect perf variations of less than 0.1%. |
| 81 | + |
| 82 | +Linux Intel |
| 83 | +----------- |
| 84 | + |
| 85 | +* Disable turbo mode:: |
| 86 | + |
| 87 | + echo 1 > /sys/devices/system/cpu/intel_pstate/no_turbo |
0 commit comments