Download Raw Diff

Details

Reviewers

Commits

rG21d4a1eec73d: [XRay][compiler-rt] Avoid InternalAlloc(...) in Profiling Mode
rCRT339978: [XRay][compiler-rt] Avoid InternalAlloc(...) in Profiling Mode
rL339978: [XRay][compiler-rt] Avoid InternalAlloc(...) in Profiling Mode

Summary

We avoid using dynamic memory allocated with the internal allocator in
the profile collection service used by profiling mode. We use aligned
storage for globals and in-struct storage of objects we dynamically
initialize.

We also remove the dependency on Vector<...> which also internally
uses the dynamic allocator in sanitizer_common (InternalAlloc) in favour
of the XRay allocator and segmented array implementation.

This change addresses llvm.org/PR38577.

Diff Detail

Repository: rL LLVM

Event Timeline

dberris created this revision.Aug 15 2018, 8:10 AM

Remove dependency on Vector<...> as well.

Harbormaster completed remote builds in B21513: Diff 160822.Aug 15 2018, 9:02 AM

Update to add correct includes for Array<...> and Allocator<...>.

Harbormaster completed remote builds in B21515: Diff 160830.Aug 15 2018, 9:25 AM

dberris mentioned this in D48879: [XRay][test-suite] Benchmarks for profiling mode implementation.Aug 16 2018, 2:20 AM

I'm trying to understand the memory allocation/deallocation dynamics of this module. It looks like all of the Array<...> objects only have their memory allocation increase because the XRay segmented array doesn't support having its allocation shrink. However, the memory buffers that each cell in ProfileBuffers point to do get deallocated inside serialize(). Is this correct? Why are these buffers special that they should be deallocated rather than kept around for reuse later like all the other memory?

compiler-rt/lib/xray/xray_profile_collector.cc
64 ↗	(On Diff #160830)	Do you mean dynamic memory allocation? Why does alignment affect whether it's done?
84 ↗	(On Diff #160830)	fd argument should be set to -1
216 ↗	(On Diff #160830)	suggest keeping the comments, as this one went away, but the "then repopulate..." stayed.

Address comments by eizan@.

In D50782#1202318, @eizan wrote:

I'm trying to understand the memory allocation/deallocation dynamics of this module. It looks like all of the Array<...> objects only have their memory allocation increase because the XRay segmented array doesn't support having its allocation shrink.

The Array<...> objects only grow, yes, but we're able to trim them and re-use the memory that's in the internal freelist for the array segments.

However, the memory buffers that each cell in ProfileBuffers point to do get deallocated inside serialize(). Is this correct? Why are these buffers special that they should be deallocated rather than kept around for reuse later like all the other memory?

The actual buffers are obtained through mmap, and they are not fixed-size -- the size of the buffers are dependent on how large the serialised version of the function call tries will be. We can't re-use these buffers across multiple profile collection sessions. The ProfileBuffers array hosts structs that are fixed-size (it's a pointer and a size).

Note that in the reset() function, we destroy the allocators and re-initilize them (through placement new). The static storage for the allocators and the arrays get effectively re-used, without having to reach for memory from the heap (all of the storage for the Array<...> instances will be obtained through the Allocator<...> instances). If you look at Allocator<...>, the destructor will return the memory to the system as well.

compiler-rt/lib/xray/xray_profile_collector.cc
64 ↗	(On Diff #160830)	I meant dynamic initialisation. The alignment is important so that the pointers we get when we reinterpret-cast will be of the expected alignment for an object of an appropriate size. We need that to make the placement new calls have well-defined semantics (and the pointer is appropriately aligned). We need this to be global program-duration (static) storage only, because we want to avoid relying on C++ ABI functions for registering dynamic initialisation and de-initialisation (constructor and destructor) routines.

In D50782#1202492, @dberris wrote:

In D50782#1202318, @eizan wrote:

I'm trying to understand the memory allocation/deallocation dynamics of this module. It looks like all of the Array<...> objects only have their memory allocation increase because the XRay segmented array doesn't support having its allocation shrink.

The Array<...> objects only grow, yes, but we're able to trim them and re-use the memory that's in the internal freelist for the array segments.

However, the memory buffers that each cell in ProfileBuffers point to do get deallocated inside serialize(). Is this correct? Why are these buffers special that they should be deallocated rather than kept around for reuse later like all the other memory?

The actual buffers are obtained through mmap, and they are not fixed-size -- the size of the buffers are dependent on how large the serialised version of the function call tries will be. We can't re-use these buffers across multiple profile collection sessions. The ProfileBuffers array hosts structs that are fixed-size (it's a pointer and a size).

Note that in the reset() function, we destroy the allocators and re-initilize them (through placement new). The static storage for the allocators and the arrays get effectively re-used, without having to reach for memory from the heap (all of the storage for the Array<...> instances will be obtained through the Allocator<...> instances). If you look at Allocator<...>, the destructor will return the memory to the system as well.

When/how often does serialize() get called during runtime? Should we be worried about making a number of munmap/mmap system calls while the task being profiled is running?

In D50782#1203533, @eizan wrote:

In D50782#1202492, @dberris wrote:

In D50782#1202318, @eizan wrote:

I'm trying to understand the memory allocation/deallocation dynamics of this module. It looks like all of the Array<...> objects only have their memory allocation increase because the XRay segmented array doesn't support having its allocation shrink.

The Array<...> objects only grow, yes, but we're able to trim them and re-use the memory that's in the internal freelist for the array segments.

However, the memory buffers that each cell in ProfileBuffers point to do get deallocated inside serialize(). Is this correct? Why are these buffers special that they should be deallocated rather than kept around for reuse later like all the other memory?

The actual buffers are obtained through mmap, and they are not fixed-size -- the size of the buffers are dependent on how large the serialised version of the function call tries will be. We can't re-use these buffers across multiple profile collection sessions. The ProfileBuffers array hosts structs that are fixed-size (it's a pointer and a size).

Note that in the reset() function, we destroy the allocators and re-initilize them (through placement new). The static storage for the allocators and the arrays get effectively re-used, without having to reach for memory from the heap (all of the storage for the Array<...> instances will be obtained through the Allocator<...> instances). If you look at Allocator<...>, the destructor will return the memory to the system as well.

When/how often does serialize() get called during runtime? Should we be worried about making a number of munmap/mmap system calls while the task being profiled is running?

This is controlled by calls to __xray_log_finalize(...), which is only happens on-demand and when the process is shutting down. I have no reason to believe that making a number of mmap/munmap syscalls as part of the normal course of serializing profiles to be a huge concern because they're not in the critical path.

eizan accepted this revision.Aug 16 2018, 6:48 PM

This revision is now accepted and ready to land.Aug 16 2018, 6:48 PM

Closed by commit rL339978: [XRay][compiler-rt] Avoid InternalAlloc(...) in Profiling Mode (authored by dberris). · Explain WhyAug 16 2018, 6:58 PM

This revision was automatically updated to reflect the committed changes.

Herald added a subscriber: delcypher. · View Herald TranscriptAug 16 2018, 6:58 PM

Diff 161154

compiler-rt/trunk/lib/xray/xray_profile_collector.cc

//===-- xray_profile_collector.cc ------------------------------- C++ --===//		//===-- xray_profile_collector.cc ------------------------------- C++ --===//
//		//
// The LLVM Compiler Infrastructure		// The LLVM Compiler Infrastructure
//		//
// This file is distributed under the University of Illinois Open Source		// This file is distributed under the University of Illinois Open Source
// License. See LICENSE.TXT for details.		// License. See LICENSE.TXT for details.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file is a part of XRay, a dynamic runtime instrumentation system.		// This file is a part of XRay, a dynamic runtime instrumentation system.
//		//
// This implements the interface for the profileCollectorService.		// This implements the interface for the profileCollectorService.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
#include "xray_profile_collector.h"		#include "xray_profile_collector.h"
#include "sanitizer_common/sanitizer_allocator_internal.h"
#include "sanitizer_common/sanitizer_common.h"		#include "sanitizer_common/sanitizer_common.h"
#include "sanitizer_common/sanitizer_vector.h"		#include "xray_allocator.h"
#include "xray_profiling_flags.h"		#include "xray_profiling_flags.h"
		#include "xray_segmented_array.h"
#include <memory>		#include <memory>
#include <pthread.h>		#include <pthread.h>
#include <utility>		#include <utility>

namespace __xray {		namespace __xray {
namespace profileCollectorService {		namespace profileCollectorService {

namespace {		namespace {

SpinMutex GlobalMutex;		SpinMutex GlobalMutex;
struct ThreadTrie {		struct ThreadTrie {
tid_t TId;		tid_t TId;
FunctionCallTrie *Trie;		typename std::aligned_storage<sizeof(FunctionCallTrie)>::type TrieStorage;
};		};

struct ProfileBuffer {		struct ProfileBuffer {
void *Data;		void *Data;
size_t Size;		size_t Size;
};		};

// Current version of the profile format.		// Current version of the profile format.
Show All 10 Lines
};		};

struct BlockHeader {		struct BlockHeader {
u32 BlockSize;		u32 BlockSize;
u32 BlockNum;		u32 BlockNum;
u64 ThreadId;		u64 ThreadId;
};		};

// These need to be pointers that point to heap/internal-allocator-allocated		using ThreadTriesArray = Array<ThreadTrie>;
// objects because these are accessed even at program exit.		using ProfileBufferArray = Array<ProfileBuffer>;
Vector<ThreadTrie> *ThreadTries = nullptr;		using ThreadTriesArrayAllocator = typename ThreadTriesArray::AllocatorType;
Vector<ProfileBuffer> *ProfileBuffers = nullptr;		using ProfileBufferArrayAllocator = typename ProfileBufferArray::AllocatorType;
FunctionCallTrie::Allocators *GlobalAllocators = nullptr;
		// These need to be global aligned storage to avoid dynamic initialization. We
		// need these to be aligned to allow us to placement new objects into the
		// storage, and have pointers to those objects be appropriately aligned.
		static typename std::aligned_storage<sizeof(FunctionCallTrie::Allocators)>::type
		AllocatorStorage;
		static typename std::aligned_storage<sizeof(ThreadTriesArray)>::type
		ThreadTriesStorage;
		static typename std::aligned_storage<sizeof(ProfileBufferArray)>::type
		ProfileBuffersStorage;
		static typename std::aligned_storage<sizeof(ThreadTriesArrayAllocator)>::type
		ThreadTriesArrayAllocatorStorage;
		static typename std::aligned_storage<sizeof(ProfileBufferArrayAllocator)>::type
		ProfileBufferArrayAllocatorStorage;

		static ThreadTriesArray *ThreadTries = nullptr;
		static ThreadTriesArrayAllocator *ThreadTriesAllocator = nullptr;
		static ProfileBufferArray *ProfileBuffers = nullptr;
		static ProfileBufferArrayAllocator *ProfileBuffersAllocator = nullptr;
		static FunctionCallTrie::Allocators *GlobalAllocators = nullptr;

		static void *allocateBuffer(size_t S) {
		auto B = reinterpret_cast<void *>(internal_mmap(
		NULL, S, PROT_READ \| PROT_WRITE, MAP_PRIVATE \| MAP_ANONYMOUS, -1, 0));
		if (B == MAP_FAILED) {
		if (Verbosity())
		Report("XRay Profiling: Failed to allocate memory of size %d.\n", S);
		return nullptr;
		}
		return B;
		}

		static void deallocateBuffer(void *B, size_t S) {
		if (B == nullptr)
		return;
		internal_munmap(B, S);
		}

} // namespace		} // namespace

void post(const FunctionCallTrie &T, tid_t TId) {		void post(const FunctionCallTrie &T, tid_t TId) {
static pthread_once_t Once = PTHREAD_ONCE_INIT;		static pthread_once_t Once = PTHREAD_ONCE_INIT;
pthread_once(&Once, +[] {		pthread_once(&Once, +[] { reset(); });
SpinMutexLock Lock(&GlobalMutex);
GlobalAllocators = reinterpret_cast<FunctionCallTrie::Allocators *>(
InternalAlloc(sizeof(FunctionCallTrie::Allocators)));
new (GlobalAllocators) FunctionCallTrie::Allocators();
*GlobalAllocators = FunctionCallTrie::InitAllocatorsCustom(
profilingFlags()->global_allocator_max);
ThreadTries = reinterpret_cast<Vector<ThreadTrie> *>(
InternalAlloc(sizeof(Vector<ThreadTrie>)));
new (ThreadTries) Vector<ThreadTrie>();
ProfileBuffers = reinterpret_cast<Vector<ProfileBuffer> *>(
InternalAlloc(sizeof(Vector<ProfileBuffer>)));
new (ProfileBuffers) Vector<ProfileBuffer>();
});
DCHECK_NE(GlobalAllocators, nullptr);
DCHECK_NE(ThreadTries, nullptr);
DCHECK_NE(ProfileBuffers, nullptr);

ThreadTrie *Item = nullptr;		ThreadTrie *Item = nullptr;
{		{
SpinMutexLock Lock(&GlobalMutex);		SpinMutexLock Lock(&GlobalMutex);
if (GlobalAllocators == nullptr)		if (GlobalAllocators == nullptr \|\| ThreadTries == nullptr)
return;		return;

Item = ThreadTries->PushBack();		Item = ThreadTries->Append({});
Item->TId = TId;		Item->TId = TId;
		auto Trie = reinterpret_cast<FunctionCallTrie *>(&Item->TrieStorage);
// Here we're using the internal allocator instead of the managed allocator		new (Trie) FunctionCallTrie(*GlobalAllocators);
// because:
//
// 1) We're not using the segmented array data structure to host
// FunctionCallTrie objects. We're using a Vector (from sanitizer_common)
// which works like a std::vector<...> keeping elements contiguous in
// memory. The segmented array data structure assumes that elements are
// trivially destructible, where FunctionCallTrie isn't.
//
// 2) Using a managed allocator means we need to manage that separately,
// which complicates the nature of this code. To get around that, we're
// using the internal allocator instead, which has its own global state
// and is decoupled from the lifetime management required by the managed
// allocator we have in XRay.
//
Item->Trie = reinterpret_cast<FunctionCallTrie *>(InternalAlloc(
sizeof(FunctionCallTrie), nullptr, alignof(FunctionCallTrie)));
DCHECK_NE(Item->Trie, nullptr);
new (Item->Trie) FunctionCallTrie(*GlobalAllocators);
}		}

T.deepCopyInto(*Item->Trie);		auto Trie = reinterpret_cast<FunctionCallTrie *>(&Item->TrieStorage);
		T.deepCopyInto(*Trie);
}		}

// A PathArray represents the function id's representing a stack trace. In this		// A PathArray represents the function id's representing a stack trace. In this
// context a path is almost always represented from the leaf function in a call		// context a path is almost always represented from the leaf function in a call
// stack to a root of the call trie.		// stack to a root of the call trie.
using PathArray = Array<int32_t>;		using PathArray = Array<int32_t>;

struct ProfileRecord {		struct ProfileRecord {
using PathAllocator = typename PathArray::AllocatorType;		using PathAllocator = typename PathArray::AllocatorType;

// The Path in this record is the function id's from the leaf to the root of		// The Path in this record is the function id's from the leaf to the root of
// the function call stack as represented from a FunctionCallTrie.		// the function call stack as represented from a FunctionCallTrie.
PathArray *Path = nullptr;		PathArray Path;
const FunctionCallTrie::Node *Node = nullptr;		const FunctionCallTrie::Node *Node = nullptr;

// Constructor for in-place construction.		// Constructor for in-place construction.
ProfileRecord(PathAllocator &A, const FunctionCallTrie::Node *N)		ProfileRecord(PathAllocator &A, const FunctionCallTrie::Node *N)
: Path([&] {		: Path(A), Node(N) {}
auto P =
reinterpret_cast<PathArray *>(InternalAlloc(sizeof(PathArray)));
new (P) PathArray(A);
return P;
}()),
Node(N) {}
};		};

namespace {		namespace {

using ProfileRecordArray = Array<ProfileRecord>;		using ProfileRecordArray = Array<ProfileRecord>;

// Walk a depth-first traversal of each root of the FunctionCallTrie to generate		// Walk a depth-first traversal of each root of the FunctionCallTrie to generate
// the path(s) and the data associated with the path.		// the path(s) and the data associated with the path.
Show All 12 Lines	while (!DFSStack.empty()) {
auto Record = PRs.AppendEmplace(PA, Node);		auto Record = PRs.AppendEmplace(PA, Node);
if (Record == nullptr)		if (Record == nullptr)
return;		return;
DCHECK_NE(Record, nullptr);		DCHECK_NE(Record, nullptr);

// Traverse the Node's parents and as we're doing so, get the FIds in		// Traverse the Node's parents and as we're doing so, get the FIds in
// the order they appear.		// the order they appear.
for (auto N = Node; N != nullptr; N = N->Parent)		for (auto N = Node; N != nullptr; N = N->Parent)
Record->Path->Append(N->FId);		Record->Path.Append(N->FId);
DCHECK(!Record->Path->empty());		DCHECK(!Record->Path.empty());

for (const auto C : Node->Callees)		for (const auto C : Node->Callees)
DFSStack.Append(C.NodePtr);		DFSStack.Append(C.NodePtr);
}		}
}		}
}		}

static void serializeRecords(ProfileBuffer *Buffer, const BlockHeader &Header,		static void serializeRecords(ProfileBuffer *Buffer, const BlockHeader &Header,
const ProfileRecordArray &ProfileRecords) {		const ProfileRecordArray &ProfileRecords) {
auto NextPtr = static_cast<char *>(		auto NextPtr = static_cast<char *>(
internal_memcpy(Buffer->Data, &Header, sizeof(Header))) +		internal_memcpy(Buffer->Data, &Header, sizeof(Header))) +
sizeof(Header);		sizeof(Header);
for (const auto &Record : ProfileRecords) {		for (const auto &Record : ProfileRecords) {
// List of IDs follow:		// List of IDs follow:
for (const auto FId : *Record.Path)		for (const auto FId : Record.Path)
NextPtr =		NextPtr =
static_cast<char *>(internal_memcpy(NextPtr, &FId, sizeof(FId))) +		static_cast<char *>(internal_memcpy(NextPtr, &FId, sizeof(FId))) +
sizeof(FId);		sizeof(FId);

// Add the sentinel here.		// Add the sentinel here.
constexpr int32_t SentinelFId = 0;		constexpr int32_t SentinelFId = 0;
NextPtr = static_cast<char *>(		NextPtr = static_cast<char *>(
internal_memset(NextPtr, SentinelFId, sizeof(SentinelFId))) +		internal_memset(NextPtr, SentinelFId, sizeof(SentinelFId))) +
Show All 13 Lines	static void serializeRecords(ProfileBuffer *Buffer, const BlockHeader &Header,
DCHECK_EQ(NextPtr - static_cast<char *>(Buffer->Data), Buffer->Size);		DCHECK_EQ(NextPtr - static_cast<char *>(Buffer->Data), Buffer->Size);
}		}

} // namespace		} // namespace

void serialize() {		void serialize() {
SpinMutexLock Lock(&GlobalMutex);		SpinMutexLock Lock(&GlobalMutex);

// Clear out the global ProfileBuffers.		if (GlobalAllocators == nullptr \|\| ThreadTries == nullptr \|\|
for (uptr I = 0; I < ProfileBuffers->Size(); ++I)		ProfileBuffers == nullptr)
InternalFree((*ProfileBuffers)[I].Data);		return;
ProfileBuffers->Reset();
		// Clear out the global ProfileBuffers, if it's not empty.
		for (auto &B : *ProfileBuffers)
		deallocateBuffer(B.Data, B.Size);
		ProfileBuffers->trim(ProfileBuffers->size());

if (ThreadTries->Size() == 0)		if (ThreadTries->empty())
return;		return;

// Then repopulate the global ProfileBuffers.		// Then repopulate the global ProfileBuffers.
for (u32 I = 0; I < ThreadTries->Size(); ++I) {		u32 I = 0;
		for (const auto &ThreadTrie : *ThreadTries) {
using ProfileRecordAllocator = typename ProfileRecordArray::AllocatorType;		using ProfileRecordAllocator = typename ProfileRecordArray::AllocatorType;
ProfileRecordAllocator PRAlloc(profilingFlags()->global_allocator_max);		ProfileRecordAllocator PRAlloc(profilingFlags()->global_allocator_max);
ProfileRecord::PathAllocator PathAlloc(		ProfileRecord::PathAllocator PathAlloc(
profilingFlags()->global_allocator_max);		profilingFlags()->global_allocator_max);
ProfileRecordArray ProfileRecords(PRAlloc);		ProfileRecordArray ProfileRecords(PRAlloc);

// First, we want to compute the amount of space we're going to need. We'll		// First, we want to compute the amount of space we're going to need. We'll
// use a local allocator and an __xray::Array<...> to store the intermediary		// use a local allocator and an __xray::Array<...> to store the intermediary
// data, then compute the size as we're going along. Then we'll allocate the		// data, then compute the size as we're going along. Then we'll allocate the
// contiguous space to contain the thread buffer data.		// contiguous space to contain the thread buffer data.
const auto &Trie = (ThreadTries)[I].Trie;		const auto &Trie =
		reinterpret_cast<const FunctionCallTrie >(&(ThreadTrie.TrieStorage));
if (Trie.getRoots().empty())		if (Trie.getRoots().empty())
continue;		continue;

populateRecords(ProfileRecords, PathAlloc, Trie);		populateRecords(ProfileRecords, PathAlloc, Trie);
DCHECK(!Trie.getRoots().empty());		DCHECK(!Trie.getRoots().empty());
DCHECK(!ProfileRecords.empty());		DCHECK(!ProfileRecords.empty());

// Go through each record, to compute the sizes.		// Go through each record, to compute the sizes.
//		//
// header size = block size (4 bytes)		// header size = block size (4 bytes)
// + block number (4 bytes)		// + block number (4 bytes)
// + thread id (8 bytes)		// + thread id (8 bytes)
// record size = path ids (4 bytes * number of ids + sentinel 4 bytes)		// record size = path ids (4 bytes * number of ids + sentinel 4 bytes)
// + call count (8 bytes)		// + call count (8 bytes)
// + local time (8 bytes)		// + local time (8 bytes)
// + end of record (8 bytes)		// + end of record (8 bytes)
u32 CumulativeSizes = 0;		u32 CumulativeSizes = 0;
for (const auto &Record : ProfileRecords)		for (const auto &Record : ProfileRecords)
CumulativeSizes += 20 + (4 * Record.Path->size());		CumulativeSizes += 20 + (4 * Record.Path.size());

BlockHeader Header{16 + CumulativeSizes, I, (*ThreadTries)[I].TId};		BlockHeader Header{16 + CumulativeSizes, I++, ThreadTrie.TId};
auto Buffer = ProfileBuffers->PushBack();		auto Buffer = ProfileBuffers->Append({});
Buffer->Size = sizeof(Header) + CumulativeSizes;		Buffer->Size = sizeof(Header) + CumulativeSizes;
Buffer->Data = InternalAlloc(Buffer->Size, nullptr, 64);		Buffer->Data = allocateBuffer(Buffer->Size);
DCHECK_NE(Buffer->Data, nullptr);		DCHECK_NE(Buffer->Data, nullptr);
serializeRecords(Buffer, Header, ProfileRecords);		serializeRecords(Buffer, Header, ProfileRecords);

// Now clean up the ProfileRecords array, one at a time.
for (auto &Record : ProfileRecords) {
Record.Path->~PathArray();
InternalFree(Record.Path);
}
}		}
}		}

void reset() {		void reset() {
SpinMutexLock Lock(&GlobalMutex);		SpinMutexLock Lock(&GlobalMutex);

if (ProfileBuffers != nullptr) {		if (ProfileBuffers != nullptr) {
// Clear out the profile buffers that have been serialized.		// Clear out the profile buffers that have been serialized.
for (uptr I = 0; I < ProfileBuffers->Size(); ++I)		for (auto &B : *ProfileBuffers)
InternalFree((*ProfileBuffers)[I].Data);		deallocateBuffer(B.Data, B.Size);
ProfileBuffers->Reset();		ProfileBuffers->trim(ProfileBuffers->size());
InternalFree(ProfileBuffers);
ProfileBuffers = nullptr;
}		}

if (ThreadTries != nullptr) {		if (ThreadTries != nullptr) {
// Clear out the function call tries per thread.		// Clear out the function call tries per thread.
for (uptr I = 0; I < ThreadTries->Size(); ++I) {		for (auto &T : *ThreadTries) {
auto &T = (*ThreadTries)[I];		auto Trie = reinterpret_cast<FunctionCallTrie *>(&T.TrieStorage);
T.Trie->~FunctionCallTrie();		Trie->~FunctionCallTrie();
InternalFree(T.Trie);		}
}		ThreadTries->trim(ThreadTries->size());
ThreadTries->Reset();
InternalFree(ThreadTries);
ThreadTries = nullptr;
}		}

// Reset the global allocators.		// Reset the global allocators.
if (GlobalAllocators != nullptr) {		if (GlobalAllocators != nullptr)
GlobalAllocators->~Allocators();		GlobalAllocators->~Allocators();
InternalFree(GlobalAllocators);
GlobalAllocators = nullptr;		GlobalAllocators =
}		reinterpret_cast<FunctionCallTrie::Allocators *>(&AllocatorStorage);
GlobalAllocators = reinterpret_cast<FunctionCallTrie::Allocators *>(
InternalAlloc(sizeof(FunctionCallTrie::Allocators)));
new (GlobalAllocators) FunctionCallTrie::Allocators();		new (GlobalAllocators) FunctionCallTrie::Allocators();
*GlobalAllocators = FunctionCallTrie::InitAllocators();		*GlobalAllocators = FunctionCallTrie::InitAllocators();
ThreadTries = reinterpret_cast<Vector<ThreadTrie> *>(
InternalAlloc(sizeof(Vector<ThreadTrie>)));		if (ThreadTriesAllocator != nullptr)
new (ThreadTries) Vector<ThreadTrie>();		ThreadTriesAllocator->~ThreadTriesArrayAllocator();
ProfileBuffers = reinterpret_cast<Vector<ProfileBuffer> *>(
InternalAlloc(sizeof(Vector<ProfileBuffer>)));		ThreadTriesAllocator = reinterpret_cast<ThreadTriesArrayAllocator *>(
new (ProfileBuffers) Vector<ProfileBuffer>();		&ThreadTriesArrayAllocatorStorage);
		new (ThreadTriesAllocator)
		ThreadTriesArrayAllocator(profilingFlags()->global_allocator_max);
		ThreadTries = reinterpret_cast<ThreadTriesArray *>(&ThreadTriesStorage);
		new (ThreadTries) ThreadTriesArray(*ThreadTriesAllocator);

		if (ProfileBuffersAllocator != nullptr)
		ProfileBuffersAllocator->~ProfileBufferArrayAllocator();

		ProfileBuffersAllocator = reinterpret_cast<ProfileBufferArrayAllocator *>(
		&ProfileBufferArrayAllocatorStorage);
		new (ProfileBuffersAllocator)
		ProfileBufferArrayAllocator(profilingFlags()->global_allocator_max);
		ProfileBuffers =
		reinterpret_cast<ProfileBufferArray *>(&ProfileBuffersStorage);
		new (ProfileBuffers) ProfileBufferArray(*ProfileBuffersAllocator);
}		}

XRayBuffer nextBuffer(XRayBuffer B) {		XRayBuffer nextBuffer(XRayBuffer B) {
SpinMutexLock Lock(&GlobalMutex);		SpinMutexLock Lock(&GlobalMutex);

if (ProfileBuffers == nullptr \|\| ProfileBuffers->Size() == 0)		if (ProfileBuffers == nullptr \|\| ProfileBuffers->size() == 0)
return {nullptr, 0};		return {nullptr, 0};

static pthread_once_t Once = PTHREAD_ONCE_INIT;		static pthread_once_t Once = PTHREAD_ONCE_INIT;
static typename std::aligned_storage<sizeof(XRayProfilingFileHeader)>::type		static typename std::aligned_storage<sizeof(XRayProfilingFileHeader)>::type
FileHeaderStorage;		FileHeaderStorage;
pthread_once(&Once,		pthread_once(&Once,
+[] { new (&FileHeaderStorage) XRayProfilingFileHeader{}; });		+[] { new (&FileHeaderStorage) XRayProfilingFileHeader{}; });

if (UNLIKELY(B.Data == nullptr)) {		if (UNLIKELY(B.Data == nullptr)) {
// The first buffer should always contain the file header information.		// The first buffer should always contain the file header information.
auto &FileHeader =		auto &FileHeader =
reinterpret_cast<XRayProfilingFileHeader >(&FileHeaderStorage);		reinterpret_cast<XRayProfilingFileHeader >(&FileHeaderStorage);
FileHeader.Timestamp = NanoTime();		FileHeader.Timestamp = NanoTime();
FileHeader.PID = internal_getpid();		FileHeader.PID = internal_getpid();
return {&FileHeaderStorage, sizeof(XRayProfilingFileHeader)};		return {&FileHeaderStorage, sizeof(XRayProfilingFileHeader)};
}		}

if (UNLIKELY(B.Data == &FileHeaderStorage))		if (UNLIKELY(B.Data == &FileHeaderStorage))
return {(ProfileBuffers)[0].Data, (ProfileBuffers)[0].Size};		return {(ProfileBuffers)[0].Data, (ProfileBuffers)[0].Size};

BlockHeader Header;		BlockHeader Header;
internal_memcpy(&Header, B.Data, sizeof(BlockHeader));		internal_memcpy(&Header, B.Data, sizeof(BlockHeader));
auto NextBlock = Header.BlockNum + 1;		auto NextBlock = Header.BlockNum + 1;
if (NextBlock < ProfileBuffers->Size())		if (NextBlock < ProfileBuffers->size())
return {(*ProfileBuffers)[NextBlock].Data,		return {(*ProfileBuffers)[NextBlock].Data,
(*ProfileBuffers)[NextBlock].Size};		(*ProfileBuffers)[NextBlock].Size};
return {nullptr, 0};		return {nullptr, 0};
}		}

} // namespace profileCollectorService		} // namespace profileCollectorService
} // namespace __xray		} // namespace __xray

compiler-rt/trunk/lib/xray/xray_segmented_array.h

Show First 20 Lines • Show All 319 Lines • ▼ Show 20 Lines	for (auto I = begin(); I != E; ++I)
return &(*I);		return &(*I);

return nullptr;		return nullptr;
}		}

/// Remove N Elements from the end. This leaves the blocks behind, and not		/// Remove N Elements from the end. This leaves the blocks behind, and not
/// require allocation of new blocks for new elements added after trimming.		/// require allocation of new blocks for new elements added after trimming.
void trim(size_t Elements) {		void trim(size_t Elements) {
		if (Elements == 0)
		return;

DCHECK_LE(Elements, Size);		DCHECK_LE(Elements, Size);
DCHECK_GT(Size, 0);		DCHECK_GT(Size, 0);
auto OldSize = Size;		auto OldSize = Size;
Size -= Elements;		Size -= Elements;

DCHECK_NE(Head, &SentinelSegment);		DCHECK_NE(Head, &SentinelSegment);
DCHECK_NE(Tail, &SentinelSegment);		DCHECK_NE(Tail, &SentinelSegment);

Show All 40 Lines

This is an archive of the discontinued LLVM Phabricator instance.

[XRay][compiler-rt] Avoid InternalAlloc(...) in Profiling Mode
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 161154

compiler-rt/trunk/lib/xray/xray_profile_collector.cc

compiler-rt/trunk/lib/xray/xray_segmented_array.h

This is an archive of the discontinued LLVM Phabricator instance.

[XRay][compiler-rt] Avoid InternalAlloc(...) in Profiling ModeClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 161154

compiler-rt/trunk/lib/xray/xray_profile_collector.cc

compiler-rt/trunk/lib/xray/xray_segmented_array.h

[XRay][compiler-rt] Avoid InternalAlloc(...) in Profiling Mode
ClosedPublic