This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
compiler-rt/trunk/
-
trunk/
-
include/xray/
-
xray/
-
xray_records.h
-
lib/xray/
-
xray/
-
xray_buffer_queue.h
-
xray_fdr_logging.cc
-
xray_fdr_logging_impl.h

Differential D31384

[XRay] [compiler-rt] Write buffer length to FDR log before writing buffer.
ClosedPublic

Authored by kpw on Mar 27 2017, 1:20 AM.

Download Raw Diff

Details

Reviewers

dberris
pelikan

Commits

rG9b57ca171dd0: [XRay] [compiler-rt] Write buffer length to FDR log before writing buffer.
rCRT298982: [XRay] [compiler-rt] Write buffer length to FDR log before writing buffer.
rL298982: [XRay] [compiler-rt] Write buffer length to FDR log before writing buffer.

Summary

Currently the FDR log writer, upon flushing, dumps a sequence of buffers from
its freelist to disk. A reader can read the first buffer up to an EOB record,
but then it is unclear how far ahead to scan to find the next threads traces.

There are a few ways to handle this problem.

The reader has externalized knowledge of the buffer size.
The size of buffers is in the file header or otherwise encoded in the log.
Only write out the portion of the buffer with records. When released, the buffers are marked with a size.
The reader looks for memory that matches a pattern and synchronizes on it.

2 and 3 seem the most flexible and 2 does not rule 3 out.

This is an implementation of 2.

In addition, the function handler for fdr more aggressively checks for
finalization and makes an attempt to release its buffer.

Diff Detail

Repository: rL LLVM

Event Timeline

kpw created this revision.Mar 27 2017, 1:20 AM

Harbormaster completed remote builds in B5099: Diff 93103.Mar 27 2017, 1:20 AM

kpw mentioned this in D31385: [XRay] Update FDR log reader to be aware of buffer sizes per thread..Mar 27 2017, 1:33 AM

I think I prefer putting FDR-specific information in the padding data available for the header record. We can use more bytes in the 32-byte XRayFileHeader to encode the FDR log buffer size, and if you want potentially the number of buffers as well.

This revision now requires changes to proceed.Mar 27 2017, 7:17 PM

I thought about using the header as well. If we go that route, we can't decide to optimize to only flush up through the EOB without confusing readers that aren't equipped with a per buffer frame. It seemed a little odd to have FDR specific fields in the shared header struct as well although we could always fork it into two.

That's fine if we're designing for the common case where recording is turned on for long enough to use many buffers before flushing.

We've got 30 bits left in the bitfield (although Trace.cpp ignores the timespec struct field when reading). xray_buffer_queue.h defines the buffer size as a size_t. What do you think is a reasonable maximum to limit it? I'll change or least comment the header there to keep things honest.

In D31384#711851, @kpw wrote:

I thought about using the header as well. If we go that route, we can't decide to optimize to only flush up through the EOB without confusing readers that aren't equipped with a per buffer frame. It seemed a little odd to have FDR specific fields in the shared header struct as well although we could always fork it into two.

We can make it explicit, by reserving the bytes right away in the header as an array of char's. This way we can pack whatever data we want in the remaining bytes in the header. Since we already record the type of the log in the header, we can get away with having a bit more information that way.

That's fine if we're designing for the common case where recording is turned on for long enough to use many buffers before flushing.

We've got 30 bits left in the bitfield (although Trace.cpp ignores the timespec struct field when reading). xray_buffer_queue.h defines the buffer size as a size_t. What do you think is a reasonable maximum to limit it? I'll change or least comment the header there to keep things honest.

I see that this is in the head of a specific buffer. I'm not opposed to that change. I think we can put the size of the buffer as size_t in the file header, and when we're writing the records in the buffer do the computation (like you're doing) to figure out the number of bytes that's valid in the specific buffer. While the current implementation just dumps the data for the full sized buffer, I agree with you there should be nothing stopping us from writing down just the bytes that matter for a specific buffer.

I don't think we should be limiting the possible size of the buffers artificially -- it's one of those things that really ought to be tunable through flags (which it currently isn't, but is something we can do later).

writing the file sizes to the file header rather than a separate piece of data.

LGTM

Please rebase to the tip of trunk / master, and I'll land once that's done. :)

lib/xray/xray_fdr_logging_impl.h
312 ↗	(On Diff #93324)	You probably need to rebase, we're no longer using std::atomic<...> for these data structures.

This revision is now accepted and ready to land.Mar 28 2017, 5:32 PM

Rebasing onto master.

Harbormaster completed remote builds in B5149: Diff 93332.Mar 28 2017, 7:07 PM

Something in the rebase broke and I'm getting segfaults when instrumenting my test program whether I'm on my branch or master.

It's unrelated to this change, but we'll have to track it down. It happens when calling finalize().

Somewhere between d218b6eb8 and trunk ( a lot of commits ).

dberris mentioned this in D31452: [XRay][compiler-rt] Add an end-to-end test for FDR Logging.Mar 28 2017, 9:37 PM

dberris mentioned this in rL298977: [XRay][compiler-rt] Add an end-to-end test for FDR Logging.Mar 28 2017, 10:31 PM

Closed by commit rL298982: [XRay] [compiler-rt] Write buffer length to FDR log before writing buffer. (authored by dberris). · Explain WhyMar 28 2017, 11:09 PM

This revision was automatically updated to reflect the committed changes.

dberris mentioned this in rL298983: [XRay] Update FDR log reader to be aware of buffer sizes per thread..Mar 28 2017, 11:22 PM

Revision Contents

Path

Size

compiler-rt/

trunk/

include/

xray/

xray_records.h

21 lines

lib/

xray/

xray_buffer_queue.h

3 lines

xray_fdr_logging.cc

10 lines

xray_fdr_logging_impl.h

20 lines

Diff 93345

compiler-rt/trunk/include/xray/xray_records.h

	Show All 18 Lines

	namespace __xray {			namespace __xray {

	enum FileTypes {			enum FileTypes {
	NAIVE_LOG = 0,			NAIVE_LOG = 0,
	FDR_LOG = 1,			FDR_LOG = 1,
	};			};

				// FDR mode use of the union field in the XRayFileHeader.
				struct alignas(16) FdrAdditionalHeaderData {
				uint64_t ThreadBufferSize;
				};

				static_assert(sizeof(FdrAdditionalHeaderData) == 16,
				"FdrAdditionalHeaderData != 16 bytes");

	// This data structure is used to describe the contents of the file. We use this			// This data structure is used to describe the contents of the file. We use this
	// for versioning the supported XRay file formats.			// for versioning the supported XRay file formats.
	struct alignas(32) XRayFileHeader {			struct alignas(32) XRayFileHeader {
	uint16_t Version = 0;			uint16_t Version = 0;

	// The type of file we're writing out. See the FileTypes enum for more			// The type of file we're writing out. See the FileTypes enum for more
	// information. This allows different implementations of the XRay logging to			// information. This allows different implementations of the XRay logging to
	// have different files for different information being stored.			// have different files for different information being stored.
	uint16_t Type = 0;			uint16_t Type = 0;

	// What follows are a set of flags that indicate useful things for when			// What follows are a set of flags that indicate useful things for when
	// reading the data in the file.			// reading the data in the file.
	bool ConstantTSC : 1;			bool ConstantTSC : 1;
	bool NonstopTSC : 1;			bool NonstopTSC : 1;

	// The frequency by which TSC increases per-second.			// The frequency by which TSC increases per-second.
	alignas(8) uint64_t CycleFrequency = 0;			alignas(8) uint64_t CycleFrequency = 0;

				union {
				char FreeForm[16];
	// The current civiltime timestamp, as retrived from 'clock_gettime'. This			// The current civiltime timestamp, as retrived from 'clock_gettime'. This
	// allows readers of the file to determine when the file was created or			// allows readers of the file to determine when the file was created or
	// written down.			// written down.
	struct timespec TS;			struct timespec TS;

				struct FdrAdditionalHeaderData FdrData;
				};
	} __attribute__((packed));			} __attribute__((packed));

	static_assert(sizeof(XRayFileHeader) == 32, "XRayFileHeader != 32 bytes");			static_assert(sizeof(XRayFileHeader) == 32, "XRayFileHeader != 32 bytes");

	enum RecordTypes {			enum RecordTypes {
	NORMAL = 0,			NORMAL = 0,
	};			};

	Show All 30 Lines

compiler-rt/trunk/lib/xray/xray_buffer_queue.h

Show First 20 Lines • Show All 92 Lines • ▼ Show 20 Lines	public:
/// - ...		/// - ...
ErrorCode releaseBuffer(Buffer &Buf);		ErrorCode releaseBuffer(Buffer &Buf);

bool finalizing() const {		bool finalizing() const {
return __sanitizer::atomic_load(&Finalizing,		return __sanitizer::atomic_load(&Finalizing,
__sanitizer::memory_order_acquire);		__sanitizer::memory_order_acquire);
}		}

		/// Returns the configured size of the buffers in the buffer queue.
		size_t ConfiguredBufferSize() const { return BufferSize; }

/// Sets the state of the BufferQueue to finalizing, which ensures that:		/// Sets the state of the BufferQueue to finalizing, which ensures that:
///		///
/// - All subsequent attempts to retrieve a Buffer will fail.		/// - All subsequent attempts to retrieve a Buffer will fail.
/// - All releaseBuffer operations will not fail.		/// - All releaseBuffer operations will not fail.
///		///
/// After a call to finalize succeeds, all subsequent calls to finalize will		/// After a call to finalize succeeds, all subsequent calls to finalize will
/// fail with std::errc::state_not_recoverable.		/// fail with std::errc::state_not_recoverable.
ErrorCode finalize();		ErrorCode finalize();
Show All 19 Lines

compiler-rt/trunk/lib/xray/xray_fdr_logging.cc

Show First 20 Lines • Show All 119 Lines • ▼ Show 20 Lines	XRayLogFlushStatus fdrLoggingFlush() XRAY_NEVER_INSTRUMENT {
XRayFileHeader Header;		XRayFileHeader Header;
Header.Version = 1;		Header.Version = 1;
Header.Type = FileTypes::FDR_LOG;		Header.Type = FileTypes::FDR_LOG;
Header.CycleFrequency = getTSCFrequency();		Header.CycleFrequency = getTSCFrequency();
// FIXME: Actually check whether we have 'constant_tsc' and 'nonstop_tsc'		// FIXME: Actually check whether we have 'constant_tsc' and 'nonstop_tsc'
// before setting the values in the header.		// before setting the values in the header.
Header.ConstantTSC = 1;		Header.ConstantTSC = 1;
Header.NonstopTSC = 1;		Header.NonstopTSC = 1;
clock_gettime(CLOCK_REALTIME, &Header.TS);		Header.FdrData = FdrAdditionalHeaderData{LocalBQ->ConfiguredBufferSize()};
retryingWriteAll(Fd, reinterpret_cast<char *>(&Header),		retryingWriteAll(Fd, reinterpret_cast<char *>(&Header),
reinterpret_cast<char *>(&Header) + sizeof(Header));		reinterpret_cast<char *>(&Header) + sizeof(Header));

LocalBQ->apply([&](const BufferQueue::Buffer &B) {		LocalBQ->apply([&](const BufferQueue::Buffer &B) {
		uint64_t BufferSize = B.Size;
		if (BufferSize > 0) {
retryingWriteAll(Fd, reinterpret_cast<char *>(B.Buffer),		retryingWriteAll(Fd, reinterpret_cast<char *>(B.Buffer),
reinterpret_cast<char *>(B.Buffer) + B.Size);		reinterpret_cast<char *>(B.Buffer) + B.Size);
		}
});		});
__sanitizer::atomic_store(&LogFlushStatus,		__sanitizer::atomic_store(&LogFlushStatus,
XRayLogFlushStatus::XRAY_LOG_FLUSHED,		XRayLogFlushStatus::XRAY_LOG_FLUSHED,
__sanitizer::memory_order_release);		__sanitizer::memory_order_release);
return XRayLogFlushStatus::XRAY_LOG_FLUSHED;		return XRayLogFlushStatus::XRAY_LOG_FLUSHED;
}		}

XRayLogInitStatus fdrLoggingFinalize() XRAY_NEVER_INSTRUMENT {		XRayLogInitStatus fdrLoggingFinalize() XRAY_NEVER_INSTRUMENT {
▲ Show 20 Lines • Show All 68 Lines • Show Last 20 Lines

compiler-rt/trunk/lib/xray/xray_fdr_logging_impl.h

	Show First 20 Lines • Show All 305 Lines • ▼ Show 20 Lines
	}			}

	static inline void processFunctionHook(			static inline void processFunctionHook(
	int32_t FuncId, XRayEntryType Entry, uint64_t TSC, unsigned char CPU,			int32_t FuncId, XRayEntryType Entry, uint64_t TSC, unsigned char CPU,
	int (wall_clock_reader)(clockid_t, struct timespec ),			int (wall_clock_reader)(clockid_t, struct timespec ),
	__sanitizer::atomic_sint32_t &LoggingStatus,			__sanitizer::atomic_sint32_t &LoggingStatus,
	const std::shared_ptr<BufferQueue> &BQ) XRAY_NEVER_INSTRUMENT {			const std::shared_ptr<BufferQueue> &BQ) XRAY_NEVER_INSTRUMENT {
	// Bail out right away if logging is not initialized yet.			// Bail out right away if logging is not initialized yet.
	if (__sanitizer::atomic_load(&LoggingStatus,			// We should take the opportunity to release the buffer though.
	__sanitizer::memory_order_acquire) !=			auto Status = __sanitizer::atomic_load(&LoggingStatus,
	XRayLogInitStatus::XRAY_LOG_INITIALIZED)			__sanitizer::memory_order_acquire);
				if (Status != XRayLogInitStatus::XRAY_LOG_INITIALIZED) {
				if (RecordPtr != nullptr &&
				(Status == XRayLogInitStatus::XRAY_LOG_FINALIZING \|\|
				Status == XRayLogInitStatus::XRAY_LOG_FINALIZED)) {
				writeEOBMetadata();
				auto EC = BQ->releaseBuffer(Buffer);
				if (EC != BufferQueue::ErrorCode::Ok) {
				Report("Failed to release buffer at %p; error=%s\n", Buffer.Buffer,
				BufferQueue::getErrorString(EC));
	return;			return;
				}
				RecordPtr = nullptr;
				}
				return;
				}

	// We use a thread_local variable to keep track of which CPUs we've already			// We use a thread_local variable to keep track of which CPUs we've already
	// run, and the TSC times for these CPUs. This allows us to stop repeating the			// run, and the TSC times for these CPUs. This allows us to stop repeating the
	// CPU field in the function records.			// CPU field in the function records.
	//			//
	// We assume that we'll support only 65536 CPUs for x86_64.			// We assume that we'll support only 65536 CPUs for x86_64.
	thread_local uint16_t CurrentCPU = std::numeric_limits<uint16_t>::max();			thread_local uint16_t CurrentCPU = std::numeric_limits<uint16_t>::max();
	thread_local uint64_t LastTSC = 0;			thread_local uint64_t LastTSC = 0;
	▲ Show 20 Lines • Show All 175 Lines • Show Last 20 Lines