Add an extra point of dumping functions: immediately after attaching the profile information.
This dumping is enabled by newly introduced -print-profile and -print-all.
The reason is that in aggregate-only/perf2bolt mode BOLT may not reach the point of
printing the function after CFG is constructed (-print-cfg), while we may still want to inspect
the attached profile, especially for diff'ing purposes.