This is an archive of the discontinued LLVM Phabricator instance.

[PseudoProbe] Replace relocation with offset for entry probe.
ClosedPublic

Authored by hoy on Oct 13 2022, 1:14 PM.

Details

Summary

Currently pseudo probe encoding for a function is like:

  • For the first probe, a relocation from it to its physical position in the code body
  • For subsequent probes, an incremental offset from the current probe to the previous probe

The relocation could potentially cause relocation overflow during link time. I'm now replacing it with an offset from the first probe to the function start address.

A source function could be lowered into multiple split functions due to outlining (e.g, coro-split). Since those split functions have independent link-time layout, to really avoid relocations from .pseudo_probe sections to .text sections, the offset to replace with should really be the offset from the probe's enclosing split function, rather than from the entry of the source function. This requires some changes to previous section-based emission scheme which now switches to be function-based. The assembly form of pseudo probe directive is also changed correspondingly, i.e, reflecting the split function name.

A sentinel probe is emitted for each of the functions with a different name from the source. The sentinel probe indicates the elf symbol name to differentiate subsequent probes from the ones from a different split function. For examples, given source function

Foo() {
  …
  Probe 1
  …
  Probe 2
}

If it is transformed into two split functions:

Foo:
   …

Foo.outlined:
   …

The encoding for the two split functions will be separate:

GUID of Foo
  Probe 1

GUID of Foo 
  Sentinel probe of Foo.outlined 
  Probe 2

Then probe1 will be decoded against binary Foo's address, and Probe 2 will be decoded against Foo.outlined. The sentinel probe of Foo.outlined makes sure there's not accidental relocation from Foo.outlined's probes to Foo's entry address.

Decoding change will be in separate patch.

Diff Detail

Event Timeline

hoy created this revision.Oct 13 2022, 1:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 1:14 PM
hoy requested review of this revision.Oct 13 2022, 1:14 PM
Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2022, 1:14 PM
hoy updated this revision to Diff 468246.Oct 17 2022, 10:12 AM

Updating D135912: [PseudoProbe] Replace relocation with offset for entry probe.

hoy edited the summary of this revision. (Show Details)Oct 17 2022, 10:17 AM

Not an expert, just some hints in case you like the suggestions.

llvm/include/llvm/MC/MCPseudoProbe.h
299

Is this over 80col?

llvm/lib/MC/MCPseudoProbe.cpp
151

typo Santinel->Sentinel

214–228

std::map is an expensive data structure. For usage patterns that are strictly "insert then query", considering using a vector + std::sort.

Reference https://llvm.org/docs/ProgrammersManual.html#dss-sortedvectormap

hoy marked 2 inline comments as done.Oct 21 2022, 12:41 PM
hoy added inline comments.
llvm/include/llvm/MC/MCPseudoProbe.h
299

Yes, it is. Fixed.

llvm/lib/MC/MCPseudoProbe.cpp
214–228

Good point. Changed to using vector+sort.

hoy updated this revision to Diff 469718.Oct 21 2022, 12:41 PM
hoy marked an inline comment as done.
hoy edited the summary of this revision. (Show Details)

Addressing feedbacks.

wenlei added inline comments.Oct 24 2022, 10:43 PM
llvm/include/llvm/MC/MCPseudoProbe.h
34–36

I thought we now have GUID for symbol name for sentinel, and address offset for non-sentinel. What does regular probe that doesn't use offset refer to?

287

do you want to rename this to MCPseudoProbeFunctions?

llvm/lib/MC/MCPseudoProbe.cpp
52

Did you mean assert((LastProbe || IsSentinel) && "...")?

80–81

To be accurate, this is the GUID of the (split) binary function name / elf symbol name?

140

nit: this message doesn't provide additional info other than the assertion itself. the message could be something more meaningful - why we don't expect root node here

155

Is this checking the top level binary function name is different from the top level source function name? And this is to avoid emitting sentinel for main function body?

215

nit: this is a comparator instead of a sorter. same for the one above.

wenlei added inline comments.Oct 24 2022, 10:54 PM
llvm/include/llvm/MC/MCPseudoProbe.h
17–20

Have you considered removing this GUID field, but always require a sentinel probe for any (split) binary function including the main body?

hoy marked 5 inline comments as done.Oct 25 2022, 10:10 AM
hoy added inline comments.
llvm/include/llvm/MC/MCPseudoProbe.h
17–20

The GUID field here is the GUID of the source/dwarf name, which may be different from the actually symbol linkage name, even for the main body of the function. The source GUID will be used to decode and generate a profile against the source function name.

34–36

This is mainly for downwards compatibility, i.e, to have the new llvm-profgen handle old binaries. Also Bolt re-encoding uses the old encoding scheme for simplicity.

287

This class kind of tells how probe encoding looks physically. Probes of a function will stay in a standalone section (comdat) and that's why section is used here. I'd like to keep it as is, but don't have a strong preference. WDYT?

llvm/lib/MC/MCPseudoProbe.cpp
52

Good catch. Yeah, that's what I actually want.

80–81

Exactly.

140

Fixed.

155

Exactly. Added a comment about that.

The encoding for a function with outlined code will look like below. Note that the main entry doesn't need a sentinel probe.

GUID of Foo
  Probe 1

GUID of Foo 
  Sentinel probe of Foo.outlined 
  Probe 2
hoy updated this revision to Diff 470541.Oct 25 2022, 10:12 AM
hoy marked 4 inline comments as done.

Addressing feedbacks.

For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source.

Nit but it indeed caused confusion for me: if the definition of binary function is each MCSymbol, the above isn't accurate because we never add sentinel for the original function.

Also the term binary function is a bit confusing too. Intuitively a binary function has 1:1 mapping to source function, which is how BOLT defines binary function I believe. I also didn't find precedence in LLVM/MC where these funclets are called binary function. Can we make it straightforward and just call it split functions in code and comments?

Otherwise this looks good. Thanks for working on relocation removal.

llvm/include/llvm/MC/MCPseudoProbe.h
17–20

Ok, makes sense. Maybe clarify this in this comment section as well?

34–36

How about adding a TODO comment to remove this later?

hoy added a comment.Oct 25 2022, 1:22 PM

For those don't, a sentinel probe is emitted for each of the binary functions with a different name from the source.

Nit but it indeed caused confusion for me: if the definition of binary function is each MCSymbol, the above isn't accurate because we never add sentinel for the original function.

I see. Will just remove "Most of the source functions end up with only one binary function. For those don't, ".

Also the term binary function is a bit confusing too. Intuitively a binary function has 1:1 mapping to source function, which is how BOLT defines binary function I believe. I also didn't find precedence in LLVM/MC where these funclets are called binary function. Can we make it straightforward and just call it split functions in code and comments?

Otherwise this looks good. Thanks for working on relocation removal.

How about using code ranges instead of binary functions?

llvm/include/llvm/MC/MCPseudoProbe.h
17–20

Done.

34–36

Sounds good.

hoy updated this revision to Diff 470603.Oct 25 2022, 1:23 PM

Updating comments.

hoy edited the summary of this revision. (Show Details)Oct 25 2022, 2:38 PM

How about using code ranges instead of binary functions?

code range is very general. I'd just use split function to make it straightforward. otoh, if BOLT uses the term binary function to represent those split funclets, it would be ok to do the same here too.

hoy updated this revision to Diff 470682.Oct 25 2022, 7:34 PM

Updating D135912: [PseudoProbe] Replace relocation with offset for entry probe.

wenlei accepted this revision.Oct 26 2022, 10:37 AM

lgtm, thanks.

llvm/include/llvm/IR/PseudoProbe.h
33

nit on comment: split function entry address? main body doesn't have it.

llvm/lib/MC/MCPseudoProbe.cpp
80–81

nit on comment: function -> split function?

This revision is now accepted and ready to land.Oct 26 2022, 10:37 AM
hoy added inline comments.Oct 26 2022, 11:02 AM
llvm/include/llvm/IR/PseudoProbe.h
33

Sounds good. Strictly speaking, main body can also have a sentinel probe if its source name doesn't equal binary name, like with the .llvm. suffix, but split function here would make it more obvious.

llvm/lib/MC/MCPseudoProbe.cpp
80–81

Done.

hoy updated this revision to Diff 470872.Oct 26 2022, 11:06 AM

Renaming

This revision was landed with ongoing or failed builds.Oct 27 2022, 1:28 PM
This revision was automatically updated to reflect the committed changes.