Index: docs/CoverageMappingFormat.rst =================================================================== --- /dev/null +++ docs/CoverageMappingFormat.rst @@ -0,0 +1,458 @@ +.. role:: raw-html(raw) + :format: html + +================================= +LLVM Code Coverage Mapping Format +================================= + +.. contents:: + :local: + +Abstract +======== + +This document describes the LLVM code coverage mapping format, +the way the coverage mapping data is stored in the LLVM IR, +and the way the coverage mapping data is encoded. + +Introduction +============ + +The LLVM coverage mapping format is used to provide code coverage +analysis using LLVM's and Clang's instrumenation based profiling. It's designed +to be a self contained data format, that can be embedded into the LLVM IR and +object files. + +The coverage mapping format aims to be a "universal format" that would be +suitable for usage by any frontend, and not just by Clang. It also aims to +provide the frontend the possibility of generating the minimal coverage mapping +data in order to reduce the size of the IR and object files - for example, +instead of emitting mapping information for each statement in a function, the +frontend is allowed to group the statements with the same execution count into +regions of code, and emit the mapping information only for those regions. + +High Level Overview +=================== + +LLVM's coverage mapping format operates on a per-function level as the +profile instrumentation counters are associated with a specific function. +Each function that requires code coverage has to create +coverage mapping data that can map between the source code ranges and +the profile instrumentation counters for that function. + +Mapping Region +-------------- + +The function's coverage mapping data contains an array of mapping regions. +A mapping region stores the `source code range`_ that is covered by this region, +the `file id `_, the `coverage mapping counter`_ and +the region's kind. +There are several kinds of mapping regions: + +* Code regions associate portions of source code and `coverage mapping + counters`_. + For example: + + :raw-html:`
int main(int argc, const char *argv[]) {    // Code Region from 1:40 to 9:2
+                                              
+    if (argc > 1) {                           // Code Region from 3:17 to 5:4
+      printf("%s\n", argv[1]);                
+    } else {                                  // Code Region from 5:10 to 7:4
+      printf("\n");                           
+    }                                         
+    return 0;                                 
+  }                                           
+  
` +* Skipped regions are used to represent source ranges that were skipped + by Clang's preprocessor. + For example: + + :raw-html:`
int main() {                // Code Region from 1:12 to 6:2
+  #ifdef DEBUG                // Skipped Region from 2:1 to 4:2
+    printf("Hello world");    
+  #endif                      
+    return 0;                 
+  }                           
+  
` +* Expansion regions are used to represent Clang's macro expansions. + For example: + + :raw-html:`
int func(int x) {                              
+    #define MAX(x,y) ((x) > (y)? (x) : (y))      
+    return MAX(x, 42);                           // Expansion Region from 3:10 to 3:13
+  }                                              
+  
` + +.. _source code range: + +Source Range: +^^^^^^^^^^^^^ + +The source range record contains the starting and ending location of a certain +mapping region. Both locations include the line and the column numbers. + +.. _coverage file id: + +File ID: +^^^^^^^^ + +The file id an integer value that tells us +in which source file or macro expansion is this region located. +It enables Clang to produce mapping information for the code +defined inside macros, like this example demonstrates: + +:raw-html:`
void func(const char *str) {         // Code Region from 1:28 to 6:2 with file id 0
+  #define PUT printf("%s\n", str)    // 2 Code Regions from 2:15 to 2:34 with file ids 1 and 2
+  if(*str)                           
+    PUT;                             // Expansion Region from 4:5 to 4:8 with file id 0 that expands a macro with file id 1
+  PUT;                               // Expansion Region from 5:3 to 5:6 with file id 0 that expands a macro with file id 2
+}                                    
+
` + +.. _coverage mapping counter: +.. _coverage mapping counters: + +Counter: +^^^^^^^^ + +A coverage mapping counter can represents a reference to the profile +instrumentation counter. The execution count for a region with such counter +is determined by looking up the value of the corresponding profile +instrumentation counter. + +It can also represent a binary arithmetical expression that operates on +coverage mapping counters or other expressions. +The execution count for a region with an expression counter is determined by +evaluating the expression's arguments and then adding them together or +subtracting them from one another. +In the example below, a subtraction expression is used to compute the execution +count for the compound statement that follows the *else* keyword: + +:raw-html:`
int main(int argc, const char *argv[]) {    // Region's counter is a reference to the profile counter #0
+                                            
+  if (argc > 1) {                           // Region's counter is a reference to the profile counter #1
+    printf("%s\n", argv[1]);                
+  } else {                                  // Region's counter is an expression (reference to the profile counter #0 - reference to the profile counter #1)
+    printf("\n");                           
+  }                                         
+  return 0;                                 
+}                                           
+
` + +Finally, a coverage mapping counter can also represent an execution count of +of zero. The zero counter is used to provide coverage mapping for +unreachable statements and expressions, like in the example below: + +:raw-html:`
int main() {                   
+  return 0;                    
+  printf("Hello world!\n");    // Unreachable region's counter is zero
+}                              
+
` + +LLVM IR Representation +====================== + +The coverage mapping data is stored in the LLVM IR using a single global +constant structure variable called __llvm_coverage_mapping +with the *__llvm_covmap* section specifier. + +For example, let’s consider a C file and how it gets compiled to LLVM: + +.. code-block:: c + + int foo() { + return 42; + } + int bar() { + return 13; + } + +The coverage mapping variable generated by Clang is: + +.. code-block:: llvm + + @__llvm_coverage_mapping = internal constant { i32, i32, i32, i32, [2 x { i8*, i32, i32 }], [48 x i8] } + { i32 2, ; The number of function records + i32 23, ; The length of the string that contains the encoded translation unit filenames + i32 25, ; The length of the string that contains the encoded coverage mapping data + i32 0, ; Coverage mapping format version + [2 x { i8*, i32, i32 }] [ ; Function records + { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_foo, i32 0, i32 0), ; Function's name + i32 3, ; Function's name length + i32 9 ; Function's encoded coverage mapping data string length + }, + { i8*, i32, i32 } { i8* getelementptr inbounds ([3 x i8]* @__llvm_profile_name_bar, i32 0, i32 0), ; Function's name + i32 3, ; Function's name length + i32 9 ; Function's encoded coverage mapping data string length + }], + [48 x i8] c"..." ; Encoded data (not shown) + }, section "__DATA,__llvm_covmap", align 8 + +Version: +-------- + +The coverage mapping version number can have the following values: + +* 0 — The first (current) version of the coverage mapping format. + +Function record: +---------------- + +A function record is a structure of the following type: + +.. code-block:: llvm + + { i8*, i32, i32 } + +It contains the pointer to the function's name, function's name length, +and the length of the encoded mapping data for that function. + +Encoded data: +------------- + +The encoded data is stored in a single string that contains +the encoded filenames used by this translation unit and the encoded coverage +mapping data for each function in this translation unit. + +The encoded data has the following structure: + +`[filenames, coverageMappingDataForFunctionRecord0, coverageMappingDataForFunctionRecord1, ..., padding]` + +If necessary, the encoded data is padded with zeroes so that the size +of the data string is rounded up to the nearest multiple of 8. + +Encoding +======== + +The function's coverage mapping data is encoded as a stream of bytes, +with a simple structure. The structure consists of the encoding +`primitives `_ like variable-length unsigned integers, that +are used to encode `file id mapping`_, `Counter Expressions`_ and +the `Mapping Regions`_. + +The format of the structure follows: + ``[file id mapping, counter expressions, mapping regions]`` + +.. _cvmprimitives: + +Primitives +---------- + +.. _unsigned integer: +.. _unsigned integers: + +Unsigned Integers +^^^^^^^^^^^^^^^^^ + +Unsigned integer (uint) values are 32 bit integers that are encoded +using DWARF's LEB128 encoding, optimizing for the case where values are small +(1 byte for values less than 128). + +.. _strings: + +Strings +^^^^^^^ + +:raw-html:`` +[length\ :sub:`uint`, characters...] +:raw-html:`` + +String values are encoded using `unsigned integers`_ for the length +of the string and the sequence of bytes for its characters. + +.. _file id mapping: + +File ID Mapping +--------------- + +:raw-html:`` +[numIndices\ :sub:`uint`, filenameIndex0\ :sub:`uint`, filenameIndex1\ :sub:`uint`, ...] +:raw-html:`` + +File id mapping in a function's coverage mapping stream +contains the indices into the translation unit's filenames array. + +Counter +------- + +:raw-html:`` +[value\ :sub:`uint`] +:raw-html:`` + +A `coverage mapping counter`_ is stored as a single `unsigned integer`_ value. +The value uses the following encoding: + +:raw-html:`` +[tag\ :sub:`2`, data\ :sub:`30`] +:raw-html:`` + +This value contains +2 bit fields --- the `tag `_ +which is stored in the lowest 2 bits, +and the `counter data`_ which is stored in the remaining bits. + +.. _counter-tag: + +Tag: +^^^^ + +The counter's tag encodes the counter's kind +and, if the counter is an expression, the expression's kind. +The possible tag values are: + +* 0 - The counter is zero. + +* 1 - The counter is a reference to the profile instrumentation counter. + +* 2 - The counter is a subtraction expression. + +* 3 - The counter is an addition expression. + +.. _counter data: + +Data: +^^^^^ + +The counter's data is interpreted in the following manner: + +* When the counter is a reference to the profile instrumentation counter, + then the counter's data is the id of the profile counter. +* When the counter is an expression, then the counter's data + is the index into the array of counter expressions. + +.. _Counter Expressions: + +Counter Expressions +------------------- + +:raw-html:`` +[numExpressions\ :sub:`uint`, expr0LHS\ :sub:`Counter`, expr0RHS\ :sub:`Counter`, expr1LHS\ :sub:`Counter`, expr1RHS\ :sub:`Counter`, ...] +:raw-html:`` + +Counter expressions consist of two counters as they +represent binary arithmetic operations. + +.. _Mapping Regions: + +Mapping Regions +--------------- + +:raw-html:`` +[numRegionArrays\ :sub:`uint`, regionsForFile0, regionsForFile1, ...] +:raw-html:`` + +The mapping regions are stored in an array of sub-arrays where every +region in a particular sub-array has the same file id. + +The file id for a sub-array of regions is the index of that +sub-array in the main array e.g. The first sub-array will have the file id +of 0. + +Sub-Array of Regions +^^^^^^^^^^^^^^^^^^^^ + +:raw-html:`` +[numRegions\ :sub:`uint`, region0, region1, ...] +:raw-html:`` + +The mapping regions for a specific file id are stored in an array that is +sorted in an ascending order by the region's starting location. + +Mapping Region +^^^^^^^^^^^^^^ + +``[header, source range]`` + +The mapping region record contains two sub-records --- +the `header`_, which stores the counter and/or the region's kind, +and the `source range`_ that contains the starting and ending +location of this region. + +.. _header: + +Header +^^^^^^ + +``[counter]`` + +or + +``[pseudo-counter]`` + +The header encodes the region's counter and the region's kind. + +The value of the counter's tag distinguishes between the pseudo-counters and +counters --- if the tag is zero, than this header contains a +pseudo-counter, otherwise this header contains an ordinary counter. + +Counter: +"""""""" + +A mapping region whose header has a counter with a non-zero tag is +a code region. + +Pseudo-Counter: +""""""""""""""" + +:raw-html:`` +[value\ :sub:`uint`] +:raw-html:`` + +A pseudo-counter is stored as a single `unsigned integer`_ value, just like +the ordinary counter. +The value uses the following encoding: + +:raw-html:`` +[tag\ :sub:`2`, expansionRegionTag\ :sub:`1`, data\ :sub:`29`] +:raw-html:`` + +This value has the following interpretation: + +* bits 0-1: tag, which is always 0. + +* bit 2: expansionRegionTag. If this bit is set, then this mapping region + is an expansion region. + +* bits 3-31: data. If this region is an expansion region, then the data + contains the expanded file id of that region. + + Otherwise, the data contains the region's kind. The possible region + kind values are: + + * 0 - This mapping region is a code region with a counter of zero. + * 2 - This mapping region is a skipped region. + +.. _source range: + +Source Range +^^^^^^^^^^^^ + +:raw-html:`` +[deltaLineStart\ :sub:`uint`, columnStart\ :sub:`uint`, numLines\ :sub:`uint`, columnEnd\ :sub:`uint`] +:raw-html:`` + +The source range record contains the following fields: + +* *deltaLineStart*: The difference between the starting line of the + current mapping region and the starting line of the previous mapping region. + + If the current mapping region is the first region in the current + sub-array, then it stores the starting line of that region. + +* *columnStart*: The starting column of the mapping region. + +* *numLines*: The difference between the ending line and the starting line + of the current mapping region. + +* *columnEnd*: The ending column of the mapping region. + +Filenames +--------- + +:raw-html:`` +[numFilenames\ :sub:`uint`, filename0\ :sub:`string`, filename1\ :sub:`string`, ...] +:raw-html:`` + +The translation unit's filenames are stored in a stream of bytes and are encoded +using the `primitives `_ from the function's coverage mapping +data encoding. Index: docs/index.rst =================================================================== --- docs/index.rst +++ docs/index.rst @@ -238,6 +238,7 @@ StackMaps InAlloca BigEndianNEON + CoverageMappingFormat :doc:`WritingAnLLVMPass` Information on how to write LLVM transformations and analyses. @@ -324,6 +325,8 @@ LLVM's support for generating NEON instructions on big endian ARM targets is somewhat nonintuitive. This document explains the implementation and rationale. +:doc:`CoverageMappingFormat` + This describes the format and encoding used for LLVM’s code coverage mapping. Development Process Documentation =================================