This is an archive of the discontinued LLVM Phabricator instance.

[XRay][compiler-rt] XRay Flight Data Recorder Mode
ClosedPublic

Authored by dberris on Nov 23 2016, 12:58 AM.

Details

Summary

In this change we introduce the notion of a "flight data recorder" mode
for XRay logging, where XRay logs in-memory first, and write out data
on-demand as required (as opposed to the naive implementation that keeps
logging while tracing is "on"). This depends on D26232 where we
implement the core data structure for holding the buffers that threads
will be using to write out records of operation.

This implementation only currently works on x86_64 and depends heavily
on the TSC math to write out smaller records to the inmemory buffers.

Also, this implementation defines two different kinds of records with
different sizes (compared to the current naive implementation): a
MetadataRecord (16 bytes) and a FunctionRecord (8 bytes). MetadataRecord
entries are meant to write out information like the thread ID for which
the metadata record is defined for, whether the execution of a thread
moved to a different CPU, etc. while a FunctionRecord represents the
different kinds of function call entry/exit records we might encounter
in the course of a thread's execution along with a delta from the last
time the logging handler was called.

While this implementation is not exactly what is described in the
original XRay whitepaper, this one gives us an initial implementation
that we can iterate and build upon.

Diff Detail

Repository
rL LLVM

Event Timeline

dberris updated this revision to Diff 79042.Nov 23 2016, 12:58 AM
dberris retitled this revision from to [XRay][compiler-rt] XRay Flight Data Recorder Mode.
dberris updated this object.
dberris added reviewers: echristo, rSerge, majnemer.
dberris added a subscriber: llvm-commits.
rSerge added inline comments.Nov 24 2016, 10:39 AM
lib/xray/xray_fdr_logging.cc
31 ↗(On Diff #79042)

Dead code?

57 ↗(On Diff #79042)

Wouldn't it be better to implement both via .reset(new ...), or use std::make_unique here?

172 ↗(On Diff #79042)

Isn't the following simpler?
*reinterpret_cast<uint64_t *>(NewCPUId.Data+sizeof(uint16_t)) = TSC;

lib/xray/xray_fdr_logging.h
33 ↗(On Diff #79042)

On some platform 1-bit signed integer will have values of 0 and -1, never 1. Also, see the comment for RecordKind.

44 ↗(On Diff #79042)

If int is used here and for Type member, doesn't the structure take 19 bytes at least with some compilers? (because int bitfields take at least 4 bytes, I suppose). Can you change the type to uint8_t for these 2 members?

dberris updated this revision to Diff 79374.Nov 28 2016, 12:49 AM
  • Initial implementation of logging API
  • Refactor a bit, plug FDR mode into the runtime right away
dberris planned changes to this revision.Nov 28 2016, 12:52 AM

This is still undergoing quite a bit of churn, as I work through and experiment a bit more on the API and implementation/testing strategies. I may break this down into smaller changes, but working through the implementation this way allows for more exploration.

Please hold off on review until the changes become more stable and less churn-y.

dberris updated this revision to Diff 80377.Dec 5 2016, 8:28 PM
  • Update test to make sure we are writing data down into the log
dberris updated this revision to Diff 80383.Dec 6 2016, 12:20 AM

Rebase now that the patch we're dependent on has landed.

dberris added subscribers: kcc, eugenis.

Adding @kcc and @eugenis for compiler-rt style, etc.

This is now ready for a look.

rSerge edited edge metadata.Dec 12 2016, 8:43 AM

Ping? @rSerge

Sorry, Phabricator updates are too easy to lose track of. I'm now looking at it...

rSerge requested changes to this revision.Dec 13 2016, 5:44 AM
rSerge edited edge metadata.
rSerge added inline comments.
lib/xray/CMakeLists.txt
14–22 ↗(On Diff #80383)

Is it a hiccup in the diff? Or is there really a duplication of set(x86_64_SOURCES?

lib/xray/xray_buffer_queue.cc
53 ↗(On Diff #80383)

I think here should be either a comment on what true means, or usage of dedicated enum like BufferUsageType with values Used and Unused.

68 ↗(On Diff #80383)

I think the standard requires that free(nullptr) is safe in all C implementations.

lib/xray/xray_buffer_queue.h
42 ↗(On Diff #80383)

A comment here about the meaning of bool part would help.

lib/xray/xray_fdr_logging.h
38 ↗(On Diff #80383)

I think the base value should be specified here, I doubt it defaults to 0 in all C++ implementations. The base value would help ensuring that enum values fit 7 bits.

This revision now requires changes to proceed.Dec 13 2016, 5:44 AM
dberris updated this revision to Diff 81345.Dec 13 2016, 9:38 PM
dberris edited edge metadata.
dberris marked 6 inline comments as done.
  • Remove unncessary target in test
  • Various improvements, based on comments

Thanks @rSerge -- you made me look into the C++11 standard, and find a good way of doing the bit-fields. PTAL

lib/xray/CMakeLists.txt
14–22 ↗(On Diff #80383)

Nope that's a merge failure on my part. :)

lib/xray/xray_buffer_queue.cc
53 ↗(On Diff #80383)

An enum seems overkill here. Added a comment in both the data member declaration and here.

lib/xray/xray_fdr_logging.cc
57 ↗(On Diff #79042)

We don't have std::make_unique yet, since I think we're still stuck with C++11.

lib/xray/xray_fdr_logging.h
38 ↗(On Diff #80383)

A standard-conforming C++ implementation ought to set the first element's value to be zero (in [decl.enum]p2 on the C++11 FCD).

44 ↗(On Diff #79042)

Good point -- I did one better I think, since we can actually use enum classes for the fields that do need enums -- so instead of working around on raw values and bytes, we can use the compiler's help to make this work "better" or at least be semantically correct.

rSerge added inline comments.Dec 16 2016, 8:02 AM
lib/xray/xray_buffer_queue.cc
25 ↗(On Diff #81345)

If Tmp==nullptr, it's a memory allocation error that should be reported to the client/outer code (and further void *Tmp = malloc(BufferSize); will usually fail).
A good way of doing this would be to throw an exception, but I doubt it's allowed in LLVM because RTTI is banned.
So maybe consider an initialization method returning the error number or true/false to the calling code?

lib/xray/xray_fdr_logging.cc
186 ↗(On Diff #81345)

Why not to put here simply auto &NewBuffer = *reinterpret_cast<MetadataRecord *>(Records); ?

195 ↗(On Diff #81345)

How about auto &WalltimeMarker = *reinterpret_cast<MetadataRecord *>(Records+1); ?

57 ↗(On Diff #79042)

Oh, I missed that it's only since C++14.

lib/xray/xray_fdr_logging.h
29 ↗(On Diff #81345)

I think it should be enum class RecordType : uint8_t {

38 ↗(On Diff #80383)

Ok.

dberris updated this revision to Diff 81904.Dec 18 2016, 5:47 PM
dberris marked 3 inline comments as done.
dberris edited edge metadata.
  • Address comments in review

Thanks for the review @rSerge -- please have another look.

lib/xray/xray_buffer_queue.cc
25 ↗(On Diff #81345)

Good point. Added the bool output parameter instead.

lib/xray/xray_fdr_logging.cc
186 ↗(On Diff #81345)

The explicit version here makes it clear that Records is an array, and so we'd like to make it clear that we're getting the address of the first element.

195 ↗(On Diff #81345)

The pointer arithmetic seems less clear than taking the address of the object at index 1 of the Records array.

rSerge accepted this revision.Dec 21 2016, 5:19 AM
rSerge edited edge metadata.
This revision is now accepted and ready to land.Dec 21 2016, 5:19 AM
This revision was automatically updated to reflect the committed changes.
dberris reopened this revision.Jan 2 2017, 8:16 PM

Landing this broke aarch64 and arm.

This revision is now accepted and ready to land.Jan 2 2017, 8:16 PM
dberris updated this revision to Diff 82998.Jan 3 2017, 8:36 PM
dberris edited edge metadata.

Rebase, this time include the right headers.

pelikan added a subscriber: pelikan.Jan 4 2017, 1:37 AM

Very quick review based on this code not compiling.

include/xray/xray_records.h
45 ↗(On Diff #82998)

timespec comes from clock_gettime(2), gettimeofday(2) returns timeval which has microseconds (which are no good here).

lib/xray/xray_fdr_logging.cc
317 ↗(On Diff #82998)

I really don't see how this *ever* compiled because it isn't #included; it fails for me for native amd64 as well as cross armv7/v8. My recent refactoring introduced readTSC(&CPU) in xray_x86_64.h or xray_emulate_tsc.h respectively, please use that. The seven line #ifdef #include chain can probably be put into some (internal) header if it needs to be repeated twice (but doesn't have to).

379–384 ↗(On Diff #82998)

Can the formatting follow the 80-character lines properly, instead of having one word per line?

lib/xray/xray_fdr_logging.h
37 ↗(On Diff #82998)

In file included from /home/pelikan/src/llvm/projects/compiler-rt/lib/xray/xray_fdr_logging.cc:17:0:
/home/pelikan/src/llvm/projects/compiler-rt/lib/xray/xray_fdr_logging.h:37:21: warning: ‘xray::MetadataRecord::Type’ is too small to hold all values of ‘enum class xray::RecordType’

RecordType Type : 1;
                  ^

/home/pelikan/src/llvm/projects/compiler-rt/lib/xray/xray_fdr_logging.h:48:28: warning: ‘xray::MetadataRecord::RecordKind’ is too small to hold all values of ‘enum class xray::MetadataRecord::RecordKinds’

RecordKinds RecordKind : 7; // Use 7 bits to identify this record type.
                         ^

/home/pelikan/src/llvm/projects/compiler-rt/lib/xray/xray_fdr_logging.h:56:21: warning: ‘xray::FunctionRecord::Type’ is too small to hold all values of ‘enum class xray::RecordType’

RecordType Type : 1;
                  ^

/home/pelikan/src/llvm/projects/compiler-rt/lib/xray/xray_fdr_logging.h:62:28: warning: ‘xray::FunctionRecord::RecordKind’ is too small to hold all values of ‘enum class xray::FunctionRecord::RecordKinds’

RecordKinds RecordKind : 3;

Is this a gcc (6.3.0 used) bug?

lib/xray/xray_inmemory_log.cc
92 ↗(On Diff #82998)

Why is this going back into a lambda? I thought we wanted to separate hot code from one-off initialization for readability.

dberris updated this revision to Diff 83177.Jan 4 2017, 7:26 PM
dberris marked 2 inline comments as done.
  • Address review comments

Thanks @pelikan -- PTAL?

include/xray/xray_records.h
45 ↗(On Diff #82998)

This is fine, we use this at the header to indicate some rough time when a trace was created. It's not used anywhere else.

lib/xray/xray_fdr_logging.cc
317 ↗(On Diff #82998)

Right -- this code was written before that refactoring. I'll do that now. FWIW, clang doesn't complain about this.

lib/xray/xray_fdr_logging.h
37 ↗(On Diff #82998)

Unsure whether this is a bug, but that's a good question. RecordType is an enum class whose base is an unsigned 8-bit integral type -- and only has two values. 1 bit ought to be enough, no?

lib/xray/xray_inmemory_log.cc
92 ↗(On Diff #82998)

Ah, this was a bad merge. Fixed.

dberris updated this revision to Diff 83183.Jan 4 2017, 7:49 PM
  • Update comment for TS (gettimeofday -> clock_gettime)

Apologies for yet another round trip of bad news :-(

lib/xray/xray_fdr_logging.cc
203 ↗(On Diff #83183)

warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

I don't usually care about strict aliasing but I hear that real compiler people do ;-) I think what this wants you to do is a temporary + a memcpy(x, y, 4);.

208 ↗(On Diff #83183)

Due to the unique way Linux handles its ABI compatibility, this won't work on older 32-bit architectures (and yes, they will all break in 2038). Currently it does error the ARM cross-build. (OpenBSD fixed this a couple of years ago but XRay isn't supported there yet due to W^X .text enforcement.)

214 ↗(On Diff #83183)

GCC says this will always overflow. Can you please add a static_assert() that sizeof TS <= sizeof MetadataRecord::Data? Although, that alone won't exactly help AArch64 to build...

229–230 ↗(On Diff #83183)

warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

On both of these. Here the memcpy is obvious because there's a chance it'll get replaced with a builtin and optimised correctly.

252 ↗(On Diff #83183)

warning: dereferencing type-punned pointer will break strict-aliasing rules [-Wstrict-aliasing]

483 ↗(On Diff #83183)

1L<<32 does upset the 32-bit build. Also, because LastTSC is unsigned, the auto is unsigned and this wants to be ULL.

pelikan added inline comments.Jan 5 2017, 2:13 AM
lib/xray/xray_fdr_logging.cc
214 ↗(On Diff #83183)

News from AArch64 land:

$ echo "#include <time.h>\nint main() { return sizeof(struct timespec); }" > a.c
$ aarch64-linux-gnu-gcc a.c && LD_LIBRARY_PATH=/usr/aarch64-linux-gnu/lib64/ qemu-aarch64 ./a.out || echo $?
16

rSerge added inline comments.Jan 5 2017, 6:12 AM
lib/xray/xray_fdr_logging.h
37 ↗(On Diff #82998)

Does the warning disappear if the integer values for enum members are specified explicitly?

rSerge requested changes to this revision.Jan 5 2017, 6:12 AM
rSerge edited edge metadata.
This revision now requires changes to proceed.Jan 5 2017, 6:12 AM
dberris updated this revision to Diff 83343.Jan 5 2017, 8:33 PM
dberris edited edge metadata.
dberris marked 6 inline comments as done.
  • Address more compiler errors on gcc for arm and aarch64
pelikan added inline comments.Jan 6 2017, 12:05 AM
lib/xray/xray_fdr_logging.cc
238 ↗(On Diff #83343)

uint16_t is not 4 bytes. Maybe it is allowed by the spec (they are inequalities, I don't remember which way) but I'd punish every real-world implementation that made u16 out of 4 8-bit bytes just out of spite ;-) If there's struct padding involved, call it u32. If not, call it 2 bytes.

241–242 ↗(On Diff #83343)

Big endian won't like writing over those two bytes very much. Is this code amd64 only or for every platform?

lib/xray/xray_fdr_logging.h
37 ↗(On Diff #82998)

It does not disappear. It appears to be a genuine bug. I'll try with the trunk GCC and then file a minimal case as a bug report.

pelikan added inline comments.Jan 6 2017, 8:44 AM
lib/xray/xray_fdr_logging.h
37 ↗(On Diff #82998)

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61414 was there already, since 2014. Sigh.

dberris updated this revision to Diff 84083.Jan 12 2017, 12:40 AM
dberris marked 2 inline comments as done.
dberris edited edge metadata.

PTAL @pelikan and/or @rSerge

lib/xray/xray_fdr_logging.cc
238 ↗(On Diff #83343)

Whoops -- good catch!

No, I really meant to use uint16_t which gives us 2 bytes worth of CPU id's -- so in essence that's 2^16 potential CPU ids.

241–242 ↗(On Diff #83343)

Yep, fixed to use sizeof(CPU) instead.

rSerge added inline comments.Jan 13 2017, 11:17 AM
include/xray/xray_log_interface.h
23–27 ↗(On Diff #84083)

Still, for debugging purpose I think it would be more convenient to have the numeric values along enum member names... say in debug I see "3" - I would have to count from the beginning of the enum to determine whether it's XRAY_LOG_FINALIZING or XRAY_LOG_FINALIZED.

rSerge accepted this revision.Jan 16 2017, 9:42 AM
rSerge edited edge metadata.

I don't see anything wrong with it so far.

This revision is now accepted and ready to land.Jan 16 2017, 9:42 AM
dberris updated this revision to Diff 85322.Jan 22 2017, 9:53 PM
dberris marked an inline comment as done.
  • Address comments by @pelikan
  • Add explicit numbers to enum values for easier debugging.
dberris added inline comments.Jan 22 2017, 9:57 PM
include/xray/xray_log_interface.h
23–27 ↗(On Diff #84083)

Good point, thanks.

dberris updated this revision to Diff 85682.Jan 24 2017, 7:22 PM
  • Remove problematic static asserts
This revision was automatically updated to reflect the committed changes.