Add some tips on benchmarking.
Disabling ASLR is a double-edged sword: you may conclude that one version is better than the other while it depends about actual final memory layout. And as such you may reach false conclusion about an optimization or a heuristic.
Is this documentation intended to benchmark LLVM itself?
Missing from the list above: pin the process to a particular CPU, as we don't want the OS to reschedule to another core during the process, and also all cores don't necessarily have the same latency/bandwidth to memory.