[XRay] Use preallocated memory for XRay profiling


[XRay] Use preallocated memory for XRay profiling

This change builds upon D54989, which removes memory allocation from the
critical path of the profiling implementation. This also changes the API
for the profile collection service, to take ownership of the memory and
associated data structures per-thread.

The consolidation of the memory allocation allows us to do two things:

  • Limits the amount of memory used by the profiling implementation, associating preallocated buffers instead of allocating memory on-demand.
  • Consolidate the memory initialisation and cleanup by relying on the buffer queue's reference counting implementation.

We find a number of places which also display some problematic
behaviour, including:

  • Off-by-factor bug in the allocator implementation.
  • Unrolling semantics in cases of "memory exhausted" situations, when managing the state of the function call trie.

We also add a few test cases which verify our understanding of the
behaviour of the system, with important edge-cases (especially for
memory-exhausted cases) in the segmented array and profile collector
unit tests.

Depends on D54989.

Reviewers: mboerger

Subscribers: dschuff, mgorny, dmgreen, jfb, llvm-commits

Differential Revision: https://reviews.llvm.org/D55249