Changing the container type for SampleProfileMap from std::unordered_map to llvm::DenseMap. This brings up to 8% speed up (31.4s vs 29.0s) when reading a large test profile, and 5% speedup (0.82s vs 0.78s) when reading the function offset table alone.
Since DenseMap does not guarantee reference validity after insertion (when rehash happens), code that use a reference to SampleProfiles while performing insertion to the map must be modified to make sure not to have dangling reference.
Since flatten does not change the original sample profile (const reference), applying the same logic to each profile, it should be traversal-order agnostic. Processing the profiles in any order should yield the same results