This is an archive of the discontinued LLVM Phabricator instance.

[compiler-rt][XRay] Initial per-thread inmemory logging implementation
ClosedPublic

Authored by dberris on Jul 5 2016, 12:26 AM.

Details

Summary

Depends on D21612 which implements the building blocks for the compiler-rt
implementation of the XRay runtime. We use a naive in-memory log of fixed-size
entries that get written out to a log file when the buffers are full, and when
the thread exits.

This implementation lays some foundations on to allowing for more complex XRay
records to be written to the log in subsequent changes. It also defines the format
that the function call accounting tool in D21987 will start building upon.

Once D21987 lands, we should be able to start defining more tests using that tool
once the function call accounting tool becomes part of the llvm distribution.

Diff Detail

Repository
rL LLVM

Event Timeline

dberris updated this revision to Diff 62718.Jul 5 2016, 12:26 AM
dberris retitled this revision from to WIP: Implement a per-thread inmemory log.
dberris updated this object.
dberris added reviewers: echristo, kcc.
dberris added a subscriber: llvm-commits.
majnemer added inline comments.
lib/xray/xray_inmemory_log.cc
107 ↗(On Diff #62718)

Could you please use the __rdtscp intrinsic?
Might be good to only do so if you are on X86 with a fallback of __builtin_readcyclecounter ?

116 ↗(On Diff #62718)

What if you get a partial write?

dberris marked 2 inline comments as done.Jul 5 2016, 1:11 AM
dberris added inline comments.
lib/xray/xray_inmemory_log.cc
107 ↗(On Diff #62718)

Using the __rdtscp intrinsic, let's cross the bridge for non-x86 later?

116 ↗(On Diff #62718)

Good catch! Makes the code a bit more complicated, but more correct AFAICT now.

dberris updated this revision to Diff 62722.Jul 5 2016, 1:11 AM
dberris marked 2 inline comments as done.
  • Use __rdtscp() and account for partial writes
majnemer added inline comments.Jul 5 2016, 1:25 AM
lib/xray/xray_inmemory_log.cc
123 ↗(On Diff #62722)

Isn't this code dead because your while condition requires Written to be non-zero?

dberris updated this revision to Diff 62724.Jul 5 2016, 1:31 AM
  • Remove dead code
dberris marked an inline comment as done.Jul 5 2016, 1:32 AM
dberris added inline comments.
lib/xray/xray_inmemory_log.cc
124 ↗(On Diff #62724)

Whoops, yes. Good catch. :)

dberris updated this revision to Diff 62725.Jul 5 2016, 1:37 AM
dberris marked an inline comment as done.
  • Remove demo logging, use in-memory logging
majnemer added inline comments.Jul 5 2016, 1:48 AM
lib/xray/xray_inmemory_log.cc
119 ↗(On Diff #62725)

Should we handle errno == EINTR?

dberris marked an inline comment as done.Jul 5 2016, 2:04 AM
dberris added inline comments.
lib/xray/xray_inmemory_log.cc
119 ↗(On Diff #62725)

Yes, and refactor this a bit so we do the right thing both times. :)

dberris updated this revision to Diff 62726.Jul 5 2016, 2:05 AM
dberris marked an inline comment as done.
  • Refactor a bit, make retrying write work consistently at thread exit and when buffer is full
mehdi_amini added inline comments.Jul 5 2016, 11:52 AM
lib/xray/xray_inmemory_log.cc
63 ↗(On Diff #62726)

Early exit:

if (!Offset) return;
65 ↗(On Diff #62726)

This while condition isn't clear to me: Written would be 0 if nothing is written, which seem unexpected.
What about a while (1)?

74 ↗(On Diff #62726)

If you want to handle partial write properly, I think you need to update start and offset as well.

I feel the interface for RetryingWrite would be more usual of C++ std algorithm with a Begin / End instead of start/offset.

108 ↗(On Diff #62726)

Shouldn't error be printed to stderr?

111 ↗(On Diff #62726)

This looks like debugging, is it intended to be here?
Do you really want unconditionally always on? And is stdout the right output?

dberris updated this revision to Diff 62821.Jul 5 2016, 9:46 PM
dberris marked 5 inline comments as done.
  • Handle partial writes better.
lib/xray/xray_inmemory_log.cc
111 ↗(On Diff #62726)

Yep, this is for debugging. I'm still thinking about how to control this filename either through an API or through environment variables.

majnemer added inline comments.Jul 5 2016, 10:11 PM
lib/xray/xray_inmemory_log.cc
80 ↗(On Diff #62821)

ISTM that this assert can fail in practice.

I'd maintain the buffer you are trying to write as just a char * to correctly handle this.

majnemer added inline comments.Jul 5 2016, 10:13 PM
lib/xray/xray_inmemory_log.cc
71 ↗(On Diff #62821)

I don't believe write can actually return zero for this case.

mehdi_amini added inline comments.Jul 5 2016, 10:36 PM
lib/xray/xray_inmemory_log.cc
71 ↗(On Diff #62821)

When does write can actually return zero according to the spec?

majnemer added inline comments.Jul 5 2016, 10:45 PM
lib/xray/xray_inmemory_log.cc
71 ↗(On Diff #62821)

From my recollection, it should only be possible if you pass write a count of zero bytes.

dberris marked 4 inline comments as done.Jul 8 2016, 12:12 AM
dberris added inline comments.
lib/xray/xray_inmemory_log.cc
71 ↗(On Diff #62821)

Ah, that's right. I was confused by the documentation too. Fixed.

80 ↗(On Diff #62821)

Good catch. Done.

dberris updated this revision to Diff 63181.Jul 8 2016, 12:13 AM
dberris marked 2 inline comments as done.
  • Rebase
  • Support partial writes better-er
mehdi_amini added inline comments.Jul 8 2016, 10:46 AM
lib/xray/xray_inmemory_log.cc
72 ↗(On Diff #63181)

I does not matter much for this patch, but for my own curiosity, do you have a pointer to the doc that mentions the zero returned case?

122 ↗(On Diff #63181)

stderr maybe?

This is still not ready, but the patch we're dependent on is ready (D21612). I will continue to iterate on this to clean it up a bit more, and add more useful information into the head of the log to help with things like figuring out how much walltime elapsed between two TSC ticks where the binary was run, a version identifier for the log, etc.

lib/xray/xray_inmemory_log.cc
73 ↗(On Diff #64106)

Link I can find: http://linux.die.net/man/2/write

It says pretty much that it will only return 0 if the size provided was 0. I don't know why anybody would do that, but I guess the API allows it so there's a special case for that. :)

123 ↗(On Diff #64106)

Yeah, I think a better way of doing this is to provide the log file base name as a flag (with a sane default, maybe synthesized as "/tmp/<pid>.xray"), then that means we don't need to write it out unless we're being verbose.

dberris updated this revision to Diff 64299.Jul 18 2016, 4:37 AM
  • Expose the log record definitions and write out headers at the beginning of the log file
dberris updated this revision to Diff 64638.Jul 19 2016, 9:30 PM
  • Undo changes in assembler

Question for Reid/Medhi/Eric:

Now that D21612 has landed I'm thinking of adding tests here to be able to do some sanity-checking of the generated log file. That though means that the "best" means of doing this is by using a tool that will output the information in a verifiable manner. We have some options in that regard:

  • Implement a stand-alone tool in compiler-rt just for testing. This seems brittle but should be simple enough to do.
  • Land D21987 in LLVM and then potentially depend on that from compiler-rt (although I'm not sure if the dependency chain would be correct).

I'm leaning towards the first solution, but I'd like to hear your thoughts on how best to go about this too.

Cheers

dberris added a reviewer: rnk.Jul 20 2016, 7:33 AM
dberris updated this revision to Diff 65345.Jul 25 2016, 7:37 AM
  • WIP: Adding one test, but still failing

I'm currently getting the following error:

		lit.py: /usr/local/google/home/dberris/xray/llvm/utils/lit/lit/TestingConfig.py:115: fatal: unable to parse config file '/usr/local/google/home/dberris/xray/llvm/projects/compiler-rt/test/xray/lit.cfg', traceback: Traceback (most recent call last):
			File "/usr/local/google/home/dberris/xray/llvm/utils/lit/lit/TestingConfig.py", line 102, in load_from_path
				exec(compile(data, path, 'exec'), cfg_globals, None)
			File "/usr/local/google/home/dberris/xray/llvm/projects/compiler-rt/test/xray/lit.cfg", line 6, in <module>
				config.name = 'XRay' + config.name_suffix
		AttributeError: TestingConfig instance has no attribute 'name_suffix'

I'm not seeing what the difference is though with other testing set-ups for other compiler-rt tests.

dberris updated this revision to Diff 65464.Jul 25 2016, 8:01 PM
  • Fixup CMake config and lit configuration
  • Fixup test case to clean up after itself

This is now ready for another look.

dberris retitled this revision from WIP: Implement a per-thread inmemory log to [compiler-rt][XRay] Initial per-thread inmemory logging implementation.Jul 25 2016, 8:05 PM
dberris updated this object.
dberris updated this revision to Diff 65466.Jul 25 2016, 8:19 PM
  • Remove redundant/wrong addition of COMPILER_RT_BUILD_XRAY
dberris updated this revision to Diff 65661.Jul 26 2016, 10:06 PM
  • Rename xray_trampoline_x86.S to xray_trampoline_x86_64.S

Based on review comments from D21612.

dberris updated this revision to Diff 66433.Aug 1 2016, 11:33 PM

Rebase again, with fix to math for stack adjustment.

majnemer added inline comments.Aug 9 2016, 10:05 PM
include/xray/xray_records.h
41–42 ↗(On Diff #67112)

Should these be bitfields?

lib/xray/xray_inmemory_log.cc
47 ↗(On Diff #67112)

Are we following LLVM convention for XRay? If so, I'd make this function start with a lowercase letter.

52–53 ↗(On Diff #67112)

This should be Written < 0. I don't believe errno will be set to anything good if write returns 0; I think you'd observe a previous errno value.

73 ↗(On Diff #67112)

Why noexcept?

90 ↗(On Diff #67112)

Ditto regarding llvm convention.

90 ↗(On Diff #67112)

Should the long be a long long?

91 ↗(On Diff #67112)

Should we pass in O_CLOEXEC?

95–96 ↗(On Diff #67112)

How do you feel about these potentially failing?

98 ↗(On Diff #67112)

Shouldn't this be s64?

156 ↗(On Diff #67112)

Any reason not to just use 0?

175 ↗(On Diff #67112)

I'd just use unsigned here to match how __rdtscp is defined.

dberris updated this revision to Diff 67464.Aug 9 2016, 10:37 PM
dberris marked 11 inline comments as done.

Rebase + style comments and some correctness fixes

include/xray/xray_records.h
41–42 ↗(On Diff #67112)

Good idea, yep.

lib/xray/xray_inmemory_log.cc
47 ↗(On Diff #67112)

Yes, we should be -- sorry I haven't quite internalised the rules. :/

52–53 ↗(On Diff #67112)

Good catch, thanks. Done.

73 ↗(On Diff #67112)

Habbit... removed.

91 ↗(On Diff #67112)

Good idea -- yes, done.

95–96 ↗(On Diff #67112)

Not very good... is it better now?

majnemer added inline comments.Aug 15 2016, 8:20 PM
include/xray/xray_records.h
40–42 ↗(On Diff #68126)

Why not give these bitfields an underlying type of bool?

lib/xray/xray_inmemory_log.cc
71 ↗(On Diff #68126)

Should we assert that Fd > 0?

77 ↗(On Diff #68126)

Why do we have this fsync? Might deserve a comment.

95–99 ↗(On Diff #68126)

This seems wrong in the face of partial reads. Perhaps a retryingRead is called for?

dberris updated this revision to Diff 68136.Aug 15 2016, 11:30 PM
dberris marked 3 inline comments as done.
  • Address review comments
  • Make use of the naive log configurable
  • Add test to make sure options from args works
dberris added inline comments.Aug 15 2016, 11:31 PM
lib/xray/xray_inmemory_log.cc
71 ↗(On Diff #68126)

Instead of asserting, I check that Fd is valid before using it in the write in the destructor.

77 ↗(On Diff #68126)

Explained a little in a comment, essentially because we're not doing any sort of reference counting (for simplicity's sake) we rely on the sync to flush the data at a thread's exit. We can implement reference counting, but that would complicate this a bit more than necessary IMO.

majnemer added inline comments.Aug 15 2016, 11:37 PM
include/xray/xray_records.h
39–42 ↗(On Diff #68136)

You could gain another byte for Padding by sending this between Type and CycleFrequency.

lib/xray/xray_inmemory_log.cc
70 ↗(On Diff #68136)

ssize_t?

dberris updated this revision to Diff 68141.Aug 15 2016, 11:59 PM
dberris marked an inline comment as done.
  • Re-order some flags in XRayFileHeader
dberris updated this revision to Diff 68142.Aug 16 2016, 12:02 AM
dberris marked an inline comment as done.
  • Use ssize_t for BytesRead
rSerge requested changes to this revision.Aug 22 2016, 8:17 AM
rSerge added a reviewer: rSerge.
rSerge added a subscriber: rSerge.

Please, see inline.

include/xray/xray_records.h
47 ↗(On Diff #68142)

So CycleFrequency is an 8-byte variable not aligned to 8 bytes. This may cause portability issues to other CPUs, not only performance drops for unaligned memory access, but they say on some CPUs unaligned access is completely forbidden.

lib/xray/xray_inmemory_log.cc
71 ↗(On Diff #68142)

So the loop runs while BytesRead<0 ? Shouldn't it run while BytesToRead > 0?
Then in an if, before checking errno, you can check whether BytesRead < 0.

90 ↗(On Diff #68142)

So retryingWrite writes all bytes, while retryingRead reads some bytes up to the given number of bytes, right? Then their parameters should indicate this, or the functions should be renamed to e.g. retryingWriteAll and retryingReadSome .

130 ↗(On Diff #68142)

This crashes if Buffer contains formatting, e.g. %d (random memory read) or worse %n (random memory write), but no formatting arguments are given. Consider changing to `fprintf(stderr, "%s", Buffer);

135 ↗(On Diff #68142)

(Minor) Why not to have BuffLen as a local static const? This would remove "action at a distance" anti-pattern.

137 ↗(On Diff #68142)

Why not thread_local ? How do you ensure no 2 threads race to check whether Fd is initialized and, if not, initializing it?

139 ↗(On Diff #68142)

Why do you call it Printf? f in printf stays for formatted, I guess. But PrintToStdErr doesn't support formatting. Consider renaming to SetPrintAndReportCallback .

141 ↗(On Diff #68142)

Would be UB in a race condition.

142 ↗(On Diff #68142)

How the magic 246 number is calculated?

143 ↗(On Diff #68142)

Is 255 logically equal to sizeof(TmpFilename)-1 ? The latter would be preferred over a magic number, and it's easier then to change just one place - the declaration of TmpFilename variable.

147 ↗(On Diff #68142)

Consider making "XXXXXX" a constant, replacing 6 with its sizeof-1.

159–163 ↗(On Diff #68142)

What if the system has multiple CPU sockets, with different CPUs installed so that TSC frequencies are different for different cores?

166 ↗(On Diff #68142)

Why not to report an error?

This revision now requires changes to proceed.Aug 22 2016, 8:17 AM
dberris updated this revision to Diff 68969.Aug 23 2016, 5:08 AM
dberris edited edge metadata.
dberris marked 12 inline comments as done.

Address comments

dberris added inline comments.Aug 23 2016, 5:10 AM
include/xray/xray_records.h
47 ↗(On Diff #68142)

Are you aware of specific CPUs where this would be an issue?

I'm happy to change it, but I'd like to understand whether this is an actual problem, or just a hypothetical one.

Note that we don't read this, we only ever write it down, so the performance hit isn't as important as maximising the packing of the bytes that eventually make it to disk. The 15 byte padding gives us 15 more bytes to use for later extending the information in the file header.

lib/xray/xray_inmemory_log.cc
137 ↗(On Diff #68142)

C++11 ensures this for us, that the first time this function is called, this static variable is initialised exactly once -- this is meant to be synchronised across all threads too.

139 ↗(On Diff #68142)

This API is defined by the common sanitizer tools -- I'm just a using it. :)

There are places where Report(...) is called with a format string and additional arguments (like printf), and underneath it dispatches to a callback when these formatted print's are actually invoked. We're providing something that deals with the string that's post-formatted.

141 ↗(On Diff #68142)

C++11 ensures this is not a racy initialisation.

142 ↗(On Diff #68142)

I was being conservative here, giving an allowance of 10 bytes difference between the size of the temporary filename's buffer size total and the base filename provided as a flag. Fixed it to do relative calculations instead of magic numbers.

159–163 ↗(On Diff #68142)

I'm not sure how this works in non-x86, but we assume that all the CPUs will run in the same rate, and use that as an approximation for the TSC frequency rate. This means if we're on this branch of the conditional, we don't get the reliable number and take a guess anyway.

sdardis added inline comments.
include/xray/xray_records.h
48 ↗(On Diff #68969)

Apologies for the drive-by comment.

Pre MIPSR6 requires different instructions (e.g. swl/swr to store 4 bytes) to write unaligned data if it is not naturally aligned to its size. Attempting to access 2/4/8 byte words with a normal load/store triggers a CPU level address exception if unaligned.

For MIPSR6 unaligned support is required, but it is implementation dependent how this is achieved. For example, an implementation could support unaligned access within a cacheline but require OS intervention if the unaligned access crosses a cacheline.

IIRC x86, PowerPC, AArch64 have native support for unaligned access. SPARC, ARM typically don't.

dberris planned changes to this revision.Aug 23 2016, 5:52 AM
dberris added inline comments.
include/xray/xray_records.h
48 ↗(On Diff #68969)

Apologies for the drive-by comment.

No need for apologies, comments most appreciated!

Pre MIPSR6 requires different instructions (e.g. swl/swr to store 4 bytes) to write unaligned data if it is not naturally aligned to its size. Attempting to access 2/4/8 byte words with a normal load/store triggers a CPU level address exception if unaligned.

For MIPSR6 unaligned support is required, but it is implementation dependent how this is achieved. For example, an implementation could support unaligned access within a cacheline but require OS intervention if the unaligned access crosses a cacheline.

IIRC x86, PowerPC, AArch64 have native support for unaligned access. SPARC, ARM typically don't.

Cool, this is good to know. Happy to change this implementation to be more portable to more platforms. :)

dberris updated this revision to Diff 68988.Aug 23 2016, 6:49 AM
dberris edited edge metadata.
  • Undo changes to CMakeLists.txt
  • Align CycleFrequency to 8 byte boundaries.
dberris marked 2 inline comments as done.Aug 23 2016, 6:52 AM

Address comments on alignment of CycleFrequency.

rSerge accepted this revision.Aug 23 2016, 7:26 AM
rSerge edited edge metadata.

Looks good to me.

lib/xray/xray_inmemory_log.cc
160–164 ↗(On Diff #68988)

I meant a x86/x86_64 system, consumer motherboards can easily have 2 sockets to install 2 different CPUs. Non-x86/x86_64 system is a harder question: a replacement for TSC/RDTSC would be needed.

This revision is now accepted and ready to land.Aug 23 2016, 7:26 AM
dberris updated this revision to Diff 69321.Aug 25 2016, 11:40 PM
dberris edited edge metadata.

Rebase before landing.

This revision was automatically updated to reflect the committed changes.
compiler-rt/trunk/lib/xray/xray_inmemory_log.cc