GO instrumentation can incur huge slowdown, especially for highly threaded
programs -- we have seen 100x. This patch adds profile instrumentation
sampling support.
It transforms:
Increment_Instruction; Instructions_after;
to:
CountVar = CountVar + 1; if (CountVar <= SampleDuration) Increment_Instruction; else if CountVar >= WholeDuration) CountVar = 0; Instructions_after;
Here CountVar is a thread-local global shared by all PGO instrumentation
variables (value-instrumentation and edge instrumentation).
Some statistics we collect using one of our large and highly threaded program:
This is using default sample-rate of 100:100019.
- sampling speeds up the instrumentation binary by 3.3x.
- overlap tool shows resulted profiles are close: Edge profile overlap: 92.771% IndirectCall profile overlap: 80.493% MemOP profile overlap: 95.114%
FDO optimize build binary is performance neutral using sampled profile in the above application.
Compile time can increase due to the added control flows.