Index: llvm/docs/XRayFramework.rst
===================================================================
--- /dev/null
+++ llvm/docs/XRayFramework.rst
@@ -0,0 +1,469 @@
+==============
+XRay Framework
+==============
+
+:Version: 1 as of 2019-01-02
+
+.. contents::
+   :local:
+
+Introduction
+============
+
+XRay is a function call tracing system combining compiler-inserted
+instrumentation and a runtime library for intercepting these function calls
+(and other kinds of events). This document describes the runtime framework
+through which the XRay runtime implementations included in the compiler-rt
+project can be used to implement user-defined logging implementations.
+
+This document is structured both as a reference guide as well as a tutorial. It
+is intended to be read in order, presenting the APIs and the hooks for how an
+implementation can be plugged into the framework.
+
+Background
+==========
+
+XRay's runtime library is made up of the following parts:
+
+* A set of APIs for handling the management and life cycle of a logging
+  implementation.
+* The instrumentation point patching/unpatching routines.
+* A set of APIs for controlling the functions invoked when the instrumentation
+  points are enabled.
+
+We're first going to start with the typical life cycle of a logging
+implementation. This is the user-facing interface that you'll typically find
+integrated in some form of control surface (through some control mechanism,
+usually something more integral to the application being traced).
+
+In this document we will walk through how an example mode could be implemented
+and registered through the logging implementation. We'll refer to
+implementations plugged in through the XRay framework as "modes" throughout
+this document. There are three modes that come with the LLVM XRay distribution:
+Basic, Flight Data Recorder (FDR), and Profiling.
+
+Controlling Modes
+=================
+
+We can control the various modes programmatically using functions exposed in
+the ``xray_log_interface.h`` header. This header comes with the compiler-rt
+project. To initialize a mode, we can refer to them through mode identifier
+strings and calling the ``__xray_log_init_mode(...)`` function.
+
+::
+
+  #include <xray/xray_log_interface.h>
+
+  int main(int argc, char* argv[]) {
+    // Initialize a mode called "xray-basic", with options.
+    auto init_status = __xray_log_init_mode(
+        "xray-basic", "func_duration_threshold_us=0");
+    if (init_status != XRayLogRegisterStatus::XRAY_REGISTRATION_OK) {
+      // We have encountered an error here.
+    }
+
+    // ... continued ...
+
+Once we've initialised the mode we want, we can then "patch" the currently
+running program using a call to ``__xray_patch()``.
+
+::
+
+  // ... continuation of main (...) ...
+
+    auto patch_status = __xray_patch();
+    if (patch_status != XRayPatchingStatus::SUCCESS) {
+      // We have failed to patch the currently running program.
+    }
+
+    // ... continued ...
+
+
+The patching operation will modify the instrumentation points in the program to
+intercept function entry and exit events, along with other special "custom" and
+typed events that might be used. Because we've initialised the specific mode
+with a specific set of options, once the patched functions run, the handlers
+associated with the mode will be invoked when these events are encountered.
+
+At this point, we can wait, spawn threads, perform the core function of
+the program, and then eventually, we can decide to finalize and flush the logs.
+To do that, we can call the appropriate functions:
+
+::
+  // ... continuation of main (...) ...
+
+    auto finalize_status = __xray_log_finalize();
+    if (finalize_status != XRayLogInitStatus::XRAY_LOG_FINALIZED) {
+      // We have filed to finalize the implementation.
+    }
+
+    auto flush_status = __xray_log_flushLog();
+    if (flush_status != XRayLogFlushStatus::XRAY_LOG_FLUSHED) {
+      // We have failed to flush the log.
+    }
+
+    // ... continued ...
+
+From here, we can also call ``__xray_unpatch()`` to undo the patching we've done
+earlier, which partially undoes the effects of the patching operation.
+
+::
+
+  // ... continuation of main (...) ...
+    patch_status = __xray_unpatch();
+    if (patch_status != XRayPatchingStatus::SUCCESS) {
+      // We have failed to unpatch the currently running program.
+    }
+
+  }  // end of main(...).
+
+This is the basic flow of control for using modes that have been registered and
+known to the XRay framework.
+
+Registering Modes
+=================
+
+The key data structure for modes we can register to the implementation is the
+``XRayLogImpl`` type, defined in ``xray_log_interface.h``. More thorough
+documentation can be found in the header, but here we'll provide an abridged
+version of the struct, and dive into the details of some of the
+assumptions/requirements of each of the pluggable functions described in the
+struct.
+
+::
+
+  struct XRayLogImpl {
+    XRayLogInitStatus (*log_init)(size_t, size_t, void*, size_t);
+    XRayLogInitStatus (*log_finalize)();
+    void (*handle_arg0)(int32_t, XRayEntryType);
+    XRayLogFlushStatus (*flush_log)();
+  }
+
+All the function pointers in the struct are required, when providing a minimal
+implementation of a mode. The four functions are:
+
+*  ``log_init``: a function pointer which takes two deprecated and unused
+  ``size_t`` arguments, a pointer to data provided through either
+  ``__xray_log_init_mode(...)`` or ``__xray_log_init_mode_bin(...)``. This
+  function must return a value of ``XRayLogInitStatus``.
+* ``log_finalize``: a function pointer which is invoked when
+  ``__xray_log_finalize()`` is invoked.
+* ``handle_arg0``: a function pointer which is invoked when a function-related
+  event is encountered by the tracing framework. This function must be all of
+  the following:
+
+    * *Re-entrant* -- this function must handle the case when it's being called
+      even while it is already executing (as in the case of an instrumented
+      signal handler).
+    * *Async-signal safe* -- this function may be invoked in a signal handler,
+      while there are other signals that are queued up.
+    * *Thread-safe* -- this function may be called in multiple concurrent
+      threads.
+
+* ``flush_log``: a function pointer which is invoked when
+  ``__xray_log_flushLog()`` is invoked.
+
+These functions are the minimum required function pointers for any mode
+implementation that can be registered with the XRay framework. However,
+implementations that would like to provide some optional functionality can call
+some special extension functions in ``xray_interface.h`` and
+``xray_log_interface.h`` in the mode initialisation function provided in the
+XRayLogImpl structure. These are:
+
+* ``__xray_set_handler_arg1(...)`` which takes a function pointer which handles
+  function events for functions that have the first argument to be logged.
+* ``__xray_set_customevent_handler(...)`` which takes a function pointer which
+  is invoked when a custom event is encountered. Custom events are those that
+  appear in source code as either C/C++ calls to ``__xray_customevent(...)`` or
+  in LLVM IR as ``llvm.xray.customevent(...)``.
+* ``__xray_log_set_buffer_iterator(...)`` lets modes provide an "iterator"
+  function which returns pointers and extents for buffers (if there are any)
+  associated with the mode.
+
+When modes register an iterator function for the buffer iterators, users can
+call the ``__xray_log_process_buffers(...)`` function providing a function
+pointer which is invoked for every buffer provided by the iterator functions.
+
+Modes can then register a mode through the ``__xray_log_register_mode(...)``
+function. One example of this might look like:
+
+::
+
+  #include <xray/xray_interface.h>
+  #include <xray/xray_log_interface.h>
+
+  XRayLogInitStatus exampleLogInit(size_t, size_t, void*, size_t);
+  XRayLogInitStatus exampleFinalize();
+  void exampleHandleArg0(int32_t, XRayEntryType);
+  XRayLogFlushStatus exampleFlushLog();
+
+  static bool Init = [] [[clang::xray_never_instrument]] {
+    auto register_mode = __xray_log_register_mode("example-mode", {
+      exampleLogInit, exampleFinalize, exampleHandleArg0, exampleFlushLog
+    });
+    return register_mode != XRayLogRegisterStatus::XRAY_REGISTRATION_OK;
+  }();
+
+This registration is global, and once a mode has been registered in this manner,
+it can be initialised and used similar to how the provided modes can be used.
+If this code is compiled into a library archive (for static linking) then the
+global initializer for ``Init`` runs before the ``main()`` function. This way
+the modes can be registered just by adding a linker dependency to an XRay
+instrumented binary.
+
+Also note that all the functions that are meant to be invoked by the XRay
+framework are best attributed with the ``[[clang::xray_never_instrument]]``
+C++11 attribute, or using the ``__attribute__((xray_never_instrument))``
+syntax. Since we're using C++11 in the implementation here, we'll be using the
+C++11 style attributes throughout the document.
+
+Example Mode Implementation
+===========================
+
+An example mode that might be useful is a simple function coverage mode
+implementation. This is meant as an exercise in exercising the features we've
+referred to above.
+
+Mode Initialization
+-------------------
+
+The first thing we do for this example implementation is to define the
+initialization function. For the purposes of illustration, we'll use C++ and
+components in the C++ standard library. It's recommended however that modes not
+use or depend on types and functions that may be XRay instrumented to reduce the
+potential overheads of the implementations. The modes that come with the
+compiler-rt project in XRay make these concessions to allow for full portability
+of the implementation.
+
+That said, for our example mode, we'll set up a global vector of booleans sized
+to the number of functions we have instrumented in the running program. To do
+this we use the ``__xray_max_function_id()`` call to determine how many
+functions we should account the coverage information for. We will synchronise
+this global boolean with a mutex for correct concurrent operation -- however, we
+should note that this is only a demonstration, that in general if you are
+implementing a mode that needs to be efficient, you may consider using
+alternative means of avoiding unnecessary contention/overheads when tracing.
+
+Here's one way of implementing this initialization function:
+
+::
+
+  #include <xray/xray_interface.h>
+  #include <xray/xray_log_interface.h>
+  #include <atomic>
+  #include <mutex>
+  #include <vector>
+
+  static std::atomic<int> example_initialized{0};
+  static std::mutex global_mutex;
+  static std::vector<bool> function_coverage;
+  static std::atomic<bool> finalized{false};
+
+  XRayLogInitStatus exampleLogInit(size_t, size_t, void*, size_t)
+  [[clang::xray_never_instrument]] {
+    // TODO: Parse the options if we ever need to support any customisation.
+    int expected = 0;
+    if (!example_initialized.compare_exchange_strong(expected, 1))
+      return expected == 1 ?
+             XRayLogInitStatus::XRAY_LOG_INITIALIZING :
+             XRayLogInitStatus::XRAY_LOG_INITIALIZED;
+
+    std::lock_guard guard(global_mutex);
+
+    // Set up the function coverage with enough storage to represent coverage
+    // information for all XRay instrumented functions.
+    function_coverage.resize(__xray_max_function_id());
+
+    // We reset the contents of the bitmap.
+    function_coverage.clear();
+
+    // Then we set the global state to indicate that we're done initializing
+    // this mode.
+    example_initialized.store(2);
+
+    // We also ensure that we are no longer in a finalized state, if we were
+    // previously finalized.
+    finalized.store(false, std::memory_order_release);
+    return XRayLogInitStatus::XRAY_LOG_INITIALIZED;
+  }
+
+Note that we needed to account for the case when multiple threads might be
+invoking the initialization function at the same time for the same mode.
+
+Handling Function Events
+------------------------
+
+We can then move onto the functions which handle the cases when function events
+are encountered. In this implementation we don't need to differentiate the
+actions for the kinds of events we encounter, but some modes (like the ones that
+come with the compiler-rt XRay runtimes) will want to differentiate function
+entry from exit events, etc.
+
+What we do for function coverage is two-fold:
+
+1. We first mark the function as "covered" in the global bitmap.
+1. We then "unpatch" the function we've just covered.
+
+Again, this implementation is not the most efficient way of doing this, but for
+the sake of brevity and simplicity, this will get us what we need.
+
+::
+
+  thread_local std::atomic<int> reentrance_guard{0};
+
+  void exampleHandleArg0(int32_t function_id, XRayEntryType)
+  [[clang::xray_never_instrument]] {
+    // Prevent recursion into this function either through a signal handler or
+    // if we're calling a function that is also instrumented from within this
+    // handler.
+    if (reentrance_guard.exchange(1, std::memory_order_acq_rel) == 1) return;
+
+    // Stop immediately if the mode has already been finalized.
+    if (finalized.load(std::memory_order_acquire)) return;
+
+    // Hold the global lock to ensure that updates to the coverage bitmap are
+    // thread-safe.
+    std::lock_guard guard(global_mutex);
+    function_coverage[function_id] = 1;
+
+    // We specifically ignore the result of unpatching a specific function.
+    __xray_unpatch_function(function_id);
+
+    // Then we reset the reentrance guard to "0".
+    reentrance_guard.store(0, std::memory_order_release);
+  }
+
+Notice that we're using a thread-local reentrance guard in the function to
+ensure that, even in the event of signal handlers being invoked (or the lock
+implementation being instrumented) we do nothing in case the current thread is
+already handling a function event.
+
+We also check whether the implementation has been finalized before we proceed
+with any modifications we're about to make to global data structures.
+
+
+Finalization and Cleanup
+------------------------
+
+There are several ways we can provide access to the data. In this example, we
+provide this through explicitly writing out a file which contains the bitmap of
+function coverage. This is the simplest thing that will possibly work, but
+sometimes users might want to expose this data through a separate interface (via
+the network or processing the data in-process).
+
+The first part we need to implement is finalization. Notice how when we're
+handling function events, we check first whether the implementation has been
+finalized -- when we are finalizing, we want to ensure that we set a flag that
+stops all threads from attempting to make any more modifications to the global
+state. The finalization routine is also usually the place where we set the
+iterator functions for in-memory processing.
+
+::
+
+  XRayBuffer exampleBufferIterator(XRayBuffer buffer)
+  [[clang::xray_never_instrument]] {
+    static std::string global_buffer;
+    if (buffer.Data == nullptr && buffer.Size == 0) {
+      // This is the first invocation of the iterator. Because our global
+      // structure is a std::vector<bool> we cannot just provide access to the
+      // underlying data. To do this properly we create a single global buffer
+      // and serialize the bitmap into that. This is only for the purposes of
+      // demonstration -- a real implementation may use a more efficient
+      // representation of the bitmap (run-length encoding, compression, etc.).
+      global_buffer.resize(__xray_max_function_id());
+      global_buffer.clear();
+      {
+        std::lock_guard guard(global_mutex);
+        for (const auto fn : function_coverage)
+          global_buffer.push_back(fn ? '1' : '0');
+      }
+      return {global_buffer.c_str(), global_buffer.size()};
+    }
+
+    if (buffer.Data == global_buffer.data() &&
+        buffer.Size == global_buffer.size()) {
+      // Clear the buffer, since we no longer need it.
+      std::string().swap(global_buffer);
+      return {nullptr, 0};
+    }
+
+    // If we get here we've encountered a bug.
+    assert(false && "Invalid argument to exampleBufferIterator!");
+  }
+
+  XRayLogInitStatus exampleFinalize() [[clang::xray_never_instrument]] {
+    // Set the global flag to "finalized", and if it's already been finalized
+    // before this call, we do nothing else. This way, only the first thread
+    // which successfully sets the finalized global flag from 0 -> 1 gets to
+    // proceed.
+    if (finalized.exchange(1, std::memory_order_acq_rel) == 1)
+      XRayLogInitStatus::XRAY_LOG_FINALIZED;
+
+    // We then set up the buffer iterators to allow us to access the global
+    // state when users call __xray_log_process_buffers(...).
+    __xray_log_set_buffer_iterator(exampleBufferIterator);
+
+    return XRayLogInitStatus::XRAY_LOG_FINALIZED;
+  }
+
+The next part we need to implement is the log flushing function. Because we
+already have a means of accessing the data through a serialized buffer, we can
+just have that implementation write out the data to a file. In this
+implementation we pick an arbitrary filename, but you can imagine using various
+approaches for configuring the filename of the output instead.
+
+::
+
+  XRayLogFlushStatus exampleFlushLog() [[clang::xray_never_instrument]] {
+    if (!finalized.load(std::memory_order_acquire))
+      return XRayLogFlushStatus::XRAY_LOG_NOT_FLUSHING;
+
+    static std::atomic<bool> flushing{false};
+    if (flushing.exchange(true, std::memory_order_acq_rel))
+      return XRayLogFlushStatus::XRAY_LOG_FLUSHING;
+
+    // Open a file and write out the contents there. We use a function-local
+    // static pointer so that it can be accessed as a global variable by the
+    // lambda we provide to the call to __xray_log_process_buffers(...).
+    // This is necessary because __xray_log_process_buffers(...) takes a
+    // function pointer, and in C++ we cannot convert a lambda with
+    // captured state into a function pointer.
+    static FILE* tmpf = 0;
+    tmpf = std::tmpfile();
+    auto result = __xray_log_process_buffers(
+      +[](const char *, XRayBuffer buffer) {
+        // write out the data raw into the temporary file.
+        std::fputs(buffer.Buffer, tmpf);
+      });
+    fclose(tmpf);
+
+    // Indicate that we are no longer flushing.
+    flushing.store(false, std::memory_order_release);
+
+    // Now that we've flushed the contents of the global data structure, we then
+    // re-set the global "initialized" flag to 0, which indicates we can now be
+    // re-initialized.
+    example_initialized.store(0, std::memory_order_release);
+    return result;
+  }
+
+Further Work
+============
+
+There are two main ways of controlling an XRay instrumented binary at the
+moment: through environment variables described in the `XRay Documentation
+<XRay.html>` and programmatically as described above. In the future we may
+consider providing a helper library which dedicates a thread of execution, or
+uses ``ptrace`` or similar functionality to control the XRay tracing framework
+externally. That should make enabling/disabling XRay more convenient than
+having to bake in the control surface to an application/binary's code.
+
+There is also potential for exposing some of the helper types and functions
+used by the XRay runtimes, to make it more convenient to implement modes more
+correctly.
+
+Contact Information
+===================
+
+Please send questions on details of implementing your own XRay modes to the
+`llvm-dev mailing list <https://lists.llvm.org/mailman/listinfo/llvm-dev>`.
Index: llvm/docs/index.rst
===================================================================
--- llvm/docs/index.rst
+++ llvm/docs/index.rst
@@ -300,6 +300,7 @@
    XRay
    XRayExample
    XRayFDRFormat
+   XRayFramework
    PDB/index
    CFIVerify
    SpeculativeLoadHardening
@@ -431,6 +432,10 @@
 :doc:`XRayExample`
   An example of how to debug an application with XRay.
 
+:doc:`XRayFramework`
+  A detailed walk-through of how to plug in a new XRay mode through the XRay
+  Framework.
+
 :doc:`The Microsoft PDB File Format <PDB/index>`
   A detailed description of the Microsoft PDB (Program Database) file format.