I'm bring up the support of pseudo-probe-based non-CS profile generation. The approach is quite similar to generating dwarf-based non-CS profile. The main difference is for a given linear instruction range, instead of each disassembled instruction, pseudo probes that are covered by the range are processed. The pseudo probe extraction code is shared with CS probe profile generation.
I'm seeing 0.7% performance win for one of our internal large benchmark compared to using non-CS dwarf-based profile, and 0.5% win for another large benchmark when combined with profi.
Out of curiosity, for the text profile, how do we know the profile is a CS nested profile or a non-CS nested profile?