Page MenuHomePhabricator

[XRay][profiler] Part 1: XRay Allocator and Array Implementations
AbandonedPublic

Authored by dberris on Apr 18 2018, 1:31 AM.

Details

Reviewers
None
Summary

This change is part of the larger XRay Profiling Mode effort.

Here we implement an arena allocator, for fixed sized buffers used in a
segmented array implementation. This change adds the segmented array
data structure, which relies on the allocator to provide and maintain
the storage for the segmented array.

Key features of the Allocator type:

  • It uses cache-aligned blocks, intended to host the actual data. These blocks are cache-line-size multiples of contiguous bytes.
  • The Allocator has a maximum memory budget, set at construction time. This allows us to cap the amount of data each specific Allocator instance is responsible for.
  • Upon destruction, the Allocator will clean up the storage it's used, handing it back to the internal allocator used in sanitizer_common.

Key features of the Array type:

  • Each segmented array is always backed by an Allocator, which is either user-provided or uses a global allocator.
  • When an Array grows, it grows by appending a segment that's fixed-sized. The size of each segment is computed by the number of elements of type T that can fit into cache line multiples.
  • An Array does not return memory to the Allocator, but it can keep track of the current number of "live" objects it stores.
  • When an Array is destroyed, it will not return memory to the Allocator. Users should clean up the Allocator independently of the Array.

These basic data structures are used by the XRay Profiling Mode
implementation to implement efficient and cache-aware storage for data
that's typically read-and-write heavy for tracking latency information.
We're relying on the cache line characteristics of the architecture to
provide us good data isolation and cache friendliness, when we're
performing operations like searching for elements and/or updating data
hosted in these cache lines.

[XRay][profiler] Part 2: XRay Function Call Trie

This is part of the larger XRay Profiling Mode effort.

This patch implements a central data structure for capturing statistics
about XRay instrumented function call stacks. The FunctionCallTrie
type does the following things:

  • It keeps track of a shadow function call stack of XRay instrumented functions as they are entered (function enter event) and as they are exited (function exit event).
  • When a function is entered, the shadow stack contains information about the entry TSC, and updates the trie (or prefix tree) representing the current function call stack. If we haven't encountered this function call before, this creates a unique node for the function in this position on the stack. We update the list of callees of the parent function as well to reflect this newly found path.
  • When a function is exited, we compute statistics (TSC deltas, function call count frequency) for the associated function(s) up the stack as we unwind to find the matching entry event.

This builds upon the XRay Allocator and Array types in Part 1 of
this series of patches.

[XRay][profiler] Part 3: Profile Collector Service

This is part of the larger XRay Profiling Mode effort.

This patch implements a centralised collector for FunctionCallTrie
instances, associated per thread. It maintains a global set of trie
instances which can be retrieved through the XRay API for processing
in-memory buffers (when registered). Future changes will include the
wiring to implement the actual profiling mode implementation.

This central service provides the following functionality:

  • Posting a FunctionCallTrie associated with a thread, to the central list of tries.
  • Serializing all the posted FunctionCallTrie instances into in-memory buffers.
  • Resetting the global state of the serialized buffers and tries.

[XRay][profiler] Part 4: Profiler Mode Wiring

This is part of the larger XRay Profiling Mode effort.

This patch implements the wiring required to enable us to actually
select the xray-profiling mode, and install the handlers to start
measuring the time and frequency of the function calls in call stacks.
The current way to get the profile information is by working with the
XRay API to __xray_process_buffers(...).

In subsequent changes we'll implement profile saving to files, similar
to how the FDR and basic modes operate, as well as means for converting
this format into those that can be loaded/visualised as flame graphs. We
will also be extending the accounting tool in LLVM to support
stack-based function call accounting.

We also continue with the implementation to support building small
histograms of latencies for the FunctionCallTrie::Node type, to allow
us to actually approximate the distribution of latencies per function.

Depends on D45758.

Event Timeline

dberris created this revision.Apr 18 2018, 1:31 AM
dberris abandoned this revision.Apr 18 2018, 1:32 AM