Page MenuHomePhabricator

[CSSPGO][llvm-profgen] Pseudo probe decoding and disassembling

Authored by wlei on Nov 30 2020, 10:23 AM.



This change implements pseudo probe decoding and disassembling for llvm-profgen/CSSPGO. Please see and for more context about CSSPGO and llvm-profgen.

ELF section format
Please see the encoding patch( for more details of the format, just copy the example here:

Two section(.pseudo_probe_desc and  .pseudoprobe ) is emitted in ELF to support pseudo probe.
The format of .pseudo_probe_desc section looks like:

.section   .pseudo_probe_desc,"",@progbits
.quad   6309742469962978389  // Func GUID
.quad   4294967295           // Func Hash
.byte   9                    // Length of func name
.ascii  "_Z5funcAi"          // Func name
.quad   7102633082150537521
.quad   138828622701
.byte   12
.ascii  "_Z8funcLeafi"
.quad   446061515086924981
.quad   4294967295
.byte   9
.ascii  "_Z5funcBi"
.quad   -2016976694713209516
.quad   72617220756
.byte   7
.ascii  "_Z3fibi"

For each .pseudoprobe section, the encoded binary data consists of a single function record corresponding to an outlined function (i.e, a function with a code entry in the .text section). A function record has the following format :

FUNCTION BODY (one for each outlined function present in the text section)
    GUID (uint64)
        GUID of the function
        Number of probes originating from this function.
        Number of callees inlined into this function, aka number of
        first-level inlinees
        A list of NPROBES entries. Each entry contains:
          INDEX (ULEB128)
          TYPE (uint4)
            0 - block probe, 1 - indirect call, 2 - direct call
          ATTRIBUTE (uint3)
          ADDRESS_TYPE (uint1)
            0 - code address, 1 - address delta
          CODE_ADDRESS (uint64 or ULEB128)
            code address or address delta, depending on ADDRESS_TYPE
        A list of NUM_INLINED_FUNCTIONS entries describing each of the inlined
        callees.  Each record contains:
          INLINE SITE
            GUID of the inlinee (uint64)
            ID of the callsite probe (ULEB128)
            A FUNCTION BODY entry describing the inlined function.

A switch --show-pseudo-probe is added to use along with --show-disassembly to print disassembly code with pseudo probe directives.

For example:

00000000002011a0 <foo2>:
  2011a0: 50                    push   rax
  2011a1: 85 ff                 test   edi,edi
  [Probe]:  FUNC: foo2  Index: 1  Type: Block
  2011a3: 74 02                 je     2011a7 <foo2+0x7>
  [Probe]:  FUNC: foo2  Index: 3  Type: Block
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  [Probe]:  FUNC: foo   Index: 1  Type: Block  Inlined: @ foo2:6
  2011a5: 58                    pop    rax
  2011a6: c3                    ret
  [Probe]:  FUNC: foo2  Index: 2  Type: Block
  2011a7: bf 01 00 00 00        mov    edi,0x1
  [Probe]:  FUNC: foo2  Index: 5  Type: IndirectCall
  2011ac: ff d6                 call   rsi
  [Probe]:  FUNC: foo2  Index: 4  Type: Block
  2011ae: 58                    pop    rax
  2011af: c3                    ret


  • PseudoProbeDecoder is added in ProfiledBinary as an infra for the decoding. It decoded the two section and generate two map: GUIDProbeFunctionMap stores all the PseudoProbeFunction which is the abstraction of a general function. AddressProbesMap stores all the pseudo probe info indexed by its address.
  • All the inline info is encoded into binary as a trie(PseudoProbeInlineTree) and will be constructed from the decoding. Each pseudo probe can get its inline context(getInlineContext) by traversing its inline tree node backwards.

Test Plan:
ninja & ninja check-llvm

Diff Detail

Event Timeline

wlei created this revision.Nov 30 2020, 10:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 30 2020, 10:23 AM
wlei requested review of this revision.Nov 30 2020, 10:23 AM
wlei edited the summary of this revision. (Show Details)Nov 30 2020, 12:20 PM
wlei added reviewers: hoy, wenlei, wmi, davidxl.
wlei updated this revision to Diff 308805.Dec 1 2020, 3:57 PM

remove deprecated code(pseudo probe CFG decoding) which is replaced by other design inside llvm

hoy added inline comments.Dec 3 2020, 5:16 PM
113 ↗(On Diff #308418)

A comment on the clang command line to build the program?


Some parts of this file can be merged with llvm/include/llvm/IR/PseudoProbe.h. Consider moving non-decoding-specific stuff into the common header?


May just use `std::pair' which has predefined hash and equal operators?


Make it a debug field? i.,e under #ifndef NDEBUG.

wlei updated this revision to Diff 309575.Dec 4 2020, 10:27 AM

address Hongtao's feedback

wlei added inline comments.Dec 4 2020, 10:47 AM

Seems std::pair only support std::map predefine comparator, has to define the hash operator for unordered_map,
so it will be like:

using InlineSite = std::pair<uint64_t, uint64_t>;
 struct InlineSiteHash {
    uint64_t operator()(const InlineSite &Site) const {
      return Site.first^ Site.second;
  std::unordered_map<InlineSite, std::unique_ptr<PseudoProbeInlineTree>,
                     InlineSiteHash>    Children;

is that what you think?


debug macro added, thanks for your suggestions.

hoy added inline comments.Dec 4 2020, 10:57 AM

yeah, and perhaps InlineSiteHash is not needed.

wlei updated this revision to Diff 310365.Dec 8 2020, 3:12 PM

Address Hongtao's feedback, reuse the code from pseudo probe encoding and some refactoring work

hoy added inline comments.Dec 9 2020, 9:24 AM

Please use class instead of struct to hide member fields like ProbeVector and Children.


Nit: consider renaming this as PseudoProbeFuncDesc.


Declare as static to save space.


Nit: add spaces between functions and fields.


const qualifier for these getters.


Nit: consider using SmallVector for fewer allocations and better locality.

wlei updated this revision to Diff 310672.Dec 9 2020, 2:44 PM

Address Hongtao's feedback

wlei added inline comments.Dec 9 2020, 3:36 PM

After the encoding patches committed, I will do the rebase and merge other parts, like PseudoProbeAttributes


Thanks for your suggestions. I intend to use std::list here to use push_front() to avoid some string reversion.
Because our input is the tree leaf, need to travel backwards to extract the context.
e.g. we have two probes, each has inlined context, like foo (main -> foo) and baz (bar -> baz)
we first process foo, extract the context: "main @ foo", "main" is pushed front of the context.
then we process` baz`, extract the context: "bar @ baz", at last putting them together will get the full context "main @ foo @ bar @ baz".

hoy added inline comments.Dec 9 2020, 3:53 PM

Thanks for the explanation. Can we use vector push_back to generate a reversed context first and then return a new vector by reversing it? We tend to not use std::list when possible because every time a new node is added to the list, an underlying malloc call will be triggered. Also nodes in list are not memory consecutive and vectorization cannot be done on a list.

wlei added inline comments.Dec 9 2020, 3:57 PM

Yeah, that's a good idea, we can iterate the vector in reversed order no need for reversing the vector. Let me change this! Thanks!

wlei updated this revision to Diff 310718.Dec 9 2020, 4:47 PM

Chang to use SmallVector instead of std::list

hoy added a comment.Dec 9 2020, 6:26 PM

LGTM. Thanks for addressing all feedbacks! Let's wait to see if there's any required change to the encoding format.

hoy accepted this revision.Dec 10 2020, 9:17 AM
This revision is now accepted and ready to land.Dec 10 2020, 9:17 AM
wlei updated this revision to Diff 310976.Dec 10 2020, 12:08 PM

change to use post-order traversal to extract inlined context stack

wlei added inline comments.Dec 10 2020, 12:15 PM

@hoy I'm just aware that we can use post-order traversal to fill in the vector recursively, which will make the context in caller-callee order, no need to do reversing. Also I'm aware that we need to do compression on the vector not the string, caller-callee order is easy for this. The inlined context size for one probe should be small, no stack overflow risk. So I just updated to change this, what do you think?

wmi added a comment.Jan 8 2021, 3:59 PM

Sorry for the delay in review. I will try to catch up.


Nit: How about define a isDummyRoot() function in PseudoProbeInlineTree for it?


If it is traversing from leaf to root, why not just use a loop instead of a recursive function?


Do we expect Data to be equal to End at the end of the loop? If that is true, better add an assertion.


Same here. If it is expected that Data to be equal to End at the end of the loop, add an assertion.


PseudoProbeInlineTree will not just represent node of inlined function but also node of outlined function, right? It is better to make the comment a little more precise. Same suggestion for the comment before class PseudoProbeInlineTree.


I assume PseudoProbeType of PseudoProbe will be direct call if Tree is not null. Indirect call has to be promoted or devirtualized to direct call before it can be inlined. Does it make sense to add an assertion here?


Nit: It is a little bit clearer to have a definition and use it also in other place.
const uint32_t PseudoProbeFirstId = static_cast<uint32_t>(PseudoProbeReservedId::Last) + 1;


This is a dummy node used to collect all the outlined functions in its Children field. Will be helpful to put it in the comment.

wlei updated this revision to Diff 315831.Jan 11 2021, 9:21 AM
wlei marked 2 inline comments as done.

Addressing Wei's feedback

wlei added a comment.Jan 11 2021, 9:25 AM
In D92334#2488123, @wmi wrote:

Sorry for the delay in review. I will try to catch up.

Thanks for your feedbacks, that's really helpful. Trying to address them.


Good suggestion!
As it's like DummyRoot --(Dummy inline site)--> OutlinedFunc, the condition is to check whether the OutlinedFunc's inline site is a dummy one. So I would like to change to the name hasInlineSite to indicate it has a real inline site not a dummy one.


I see, using recursion is for avoiding the reversing operation, but I'm also aware that the loop is good for the code locality, changed to loop.


Good suggestion. It's supposed to consume all the data here, assertion added


assertion added


Yeah, sorry for the confusion, this is unclear.
The real logic will be like:

DummyRoot -->OutlinedFunc --> InlinedFunc

Added the inline site , it will be like:

DummyRoot --(Dummy inline site)--> OutlinedFunc --(inline site)--> inlinedFunc

Refined the comments.


As mentioned above, for the tree like DummyRoot -->OutlinedFunc --> InlinedFunc, the Tree variable here can be either OutlinedFunc or InlinedFunc. It shouldn't be null.

I'm not quite sure about Indirect call type, I thought it has been already deprecated here. @hoy Could you help to confirm this? If so, I will add the assertion later.


Good point! fixed


more comments added.

hoy added inline comments.Jan 11 2021, 10:15 AM

The InlineTree of a probe indicates whether the probe itself is currently in an outlined function or from an inlined function, regardless of the probe's type. An indirect call probe from an inlinee can be inlined into the caller thus it ends up with an inline tree.

wmi accepted this revision.Jan 12 2021, 4:16 PM


This revision was landed with ongoing or failed builds.Jan 13 2021, 11:07 AM
This revision was automatically updated to reflect the committed changes.