diff --git a/llvm/docs/AMDGPULLVMExtensionsForHeterogeneousDebugging.rst b/llvm/docs/AMDGPULLVMExtensionsForHeterogeneousDebugging.rst new file mode 100644 --- /dev/null +++ b/llvm/docs/AMDGPULLVMExtensionsForHeterogeneousDebugging.rst @@ -0,0 +1,2696 @@ +=================================================== +AMDGPU LLVM Extensions for Heterogeneous Debugging +=================================================== + +.. contents:: + :local: + +.. warning:: + + This section describes **provisional support** for AMDGPU LLVM debug + information that is not currently fully implemented and is subject to change. + +Introduction +============ + +As described in the :doc:`AMDGPUDwarfExtensionsForHeterogeneousDebugging` (the +"DWARF extensions"), AMD has been working to support debugging of heterogeneous +programs. This document describes changes to the LLVM representation of debug +information (the "LLVM extensions") required to support the DWARF extensions. +These LLVM extensions continue to support previous versions of the DWARF +standard, including DWARF 5 without extensions, as well as other debug formats +which LLVM currently supports, such as CodeView. + +The LLVM extensions do not constitute a direct implementation of all concepts +from the DWARF extensions, although wherever reasonable the fundamental aspects +were kept identical. The concepts defined in the DWARF extensions which are used +directly in the LLVM extensions with their semantics unchanged are enumerated in +the :ref:`amdgpu-llvm-debug-external-definitions` section below. + +A significant departure from the DWARF extensions is in the consolidation of +expression evaluation stack entries. In the DWARF extensions, each entry on the +expression evaluation stack contains either a typed value or an untyped location +description. In the LLVM extensions, each entry on the expression evaluation +stack instead contains a pair of a location description and a type. + +Additionally, the concept of a "generic type", used as a default when a type is +needed but not stated explicitly, is eliminated. Together, these changes imply +that the concrete set of operations available differ between the DWARF and LLVM +extensions. + +These changes were made to remove redundant representations of semantically +equivalent expressions, which can simplify the compiler’s work in updating debug +information expressions to reflect code transformations. The LLVM extensions’ +changes are possible as LLVM has no requirement for backwards compatibility, nor +any requirement that the intermediate representation of debug information +conform to any particular external specification. Consequently, the LLVM +extensions are able to increase the accuracy of existing debug information, +while also extending the debug information to cover cases which were previously +not described at all. + +High-Level Goals +================ + +There are several specific cases where the LLVM extensions’ approach can allow +for more accurate or more complete debug information than would be feasible with +only incremental changes to the existing approach. + +- Support describing the location of induction variables. LLVM currently has a + new implementation of partial support for an expression which depends on + multiple LLVM values, although it is currently limited exclusively to a + subset of cases for induction variables. This support is also inherently + limited as it can only refer directly to LLVM values, not to source variables + symbolically. This means it is not possible to describe an induction variable + which, for example, depends on a variable whose location is not static over + the whole lifetime of the induction variable. +- Support describing the location of arbitrary expressions over scalar-replaced + aggregate values, even in the face of other dependent expressions. LLVM + currently drops debug information when any expression would depend on a + composite value. +- Support describing all locations of values which are live in multiple machine + locations at the same instruction. LLVM currently picks only one such + location to describe. This means values which are resident in multiple places + need to be conservatively marked read-only, even when they could be + read-write if all of their locations were reported accurately. +- Accurately support describing the range over which a given location is + active. LLVM currently pessimizes debug information as there is no rigorous + means to limit the range of a described location. +- Support describing the factoring of expressions. This allows features such as + DWARF procedures to be used to reduce the size of debug information. + Factoring can also be more convenient for the compiler to describe lexically + nested information such as program location for inactive lanes in divergent + control flow. + +Motivation +========== + +The original motivation for the LLVM extensions was to make the minimum required +changes to the existing LLVM representation of debug information needed to +support the :doc:`AMDGPUDwarfExtensionsForHeterogeneousDebugging`. This involved +an evaluation of the existing debug information for machine locations in LLVM, +which uncovered some hard-to-fix bugs rooted in the incidental complexity and +inconsistency of LLVM’s debug intrinsics and expressions. + +Attempting to address these bugs in the existing framework proved more difficult +than expected. It became apparent that the shortcomings of the existing solution +were a direct consequence of the complexity, ambiguity, and lack of +composability encountered in DWARF. + +With this in mind, we revisited the DWARF extensions to see if they could inform +a more tractable design for LLVM. We had already worked to address the +complexity and ambiguity of DWARF by defining a formalization for its expression +language and improved the composability by unifying values and location +descriptions on the evaluation stack. Together, these changes also increased the +expressiveness of DWARF. Using similar ideas in LLVM allowed us to support +additional real world cases and describe existing cases with greater accuracy. + +This led us to start from the DWARF extensions and design a new set of debug +information representations. This was very heavily influenced by prior art in +LLVM, existing RFCs, mailing list discussions, review comments, and bug reports, +without which we would not have been able to make this proposal. Some of the +influences include: + +- The use of intrinsics to capture local LLVM values keeps the proposal close + to the existing implementation, and limits the incidental work needed to + support it for the reasons outlined in `[LLVMdev] [RFC] Separating Metadata + from the Value hierarchy + `__. +- Support for debug locations which depend on multiple LLVM values is required + by several optimizations, including expressing induction variables, which is + the motivation for `D81852 [DebugInfo] Update MachineInstr interface to + better support variadic DBG_VALUE instructions + `__. +- Our solution also generalizes the notion of "fragments" to support composing + with arbitrary expressions. For example, fragmentation can be represented + even in the presence of arithmetic operators, as occurs in `D70601 Disallow + DIExpressions with shift operators from being fragmented + `__. +- The desire to support multiple concurrent locations for the same variable is + described in detail in `[llvm-dev] Proposal for multi location debug info + support in LLVM IR + `__ + (continued at `[llvm-dev] Proposal for multi location debug info support in + LLVM IR + `__) and + `Multi Location Debug Info support for LLVM + `__. Support for + overlapping location list entries was added in DWARF 5. +- Bugs, like `Bug 40628 - [DebugInfo@O2] Salvaged memory loads can observe + subsequent memory writes `__, + which was partially worked around in `D57962 [DebugInfo] PR40628: Don’t + salvage load operations `__, often result + from passes being unable to accurately represent the relationship between + source variables. Our approach supports encoding that information in debug + information in a mechanical way, with straightforward semantics. +- Use of ``distinct`` for our new metadata nodes is motivated by use cases + similar to those in `[LLVMdev] [RFC] Separating Metadata from the Value + hierarchy (David Blaikie) + `__ + where the content of a node is not sufficient context to unique it. + +The least error prone place to make changes to debug information is at the point +where the underlying code is being transformed, hence the LLVM extensions’ +representation is biased for this case. + +The expression evaluation stack contains uniform pairs of location description +and type, such that all operations have well-defined semantics and no +side-effects on the evaluation of the surrounding expression. These same +semantics apply equally throughout the compiler. This allows for referentially +transparent updates, which can be reasoned about in the context of a single +operation and its inputs and outputs, rather than the space of all possible +surrounding operations and dependent expressions. + +By eliminating any implicit expression inputs or operations and constraining the +state space of expressions using well-formedness rules, it is unambiguous +whether a given transformation is valid and semantics-preserving, without ever +having to consider anything outside of the expression itself. + +Designing around a separation of concerns regarding expression modification and +simplification allows each update to the debug information to introduce +redundant or sub-optimal expressions. To address this, an independent +"optimizer" can simplify and canonicalize expressions. As the expression +semantics are well-defined, an "optimizer" can be run without specific +knowledge of the changes made by any one pass or combination of passes. + +Incorporating a means to express "factoring", or the definition of one +expression in terms of one or more other expressions, makes "shallow" updates +possible, bounding the work needed for any given update. This factoring is +usually trivial at the time the expression is created, but expensive to infer +later. Factored expressions can result in more compact debug information by +leveraging dynamic calling of DWARF procedures in DWARF 5, and we expect to be +able to use factoring for other purposes, such as debug information for +divergent control flow (see :ref:`amdgpu-dwarf-dw-at-llvm-lane-pc`). It is +possible to statically "flatten" this factored representation later, if +required by the debug information format being emitted, or if the emitter +determines it would be more profitable to do so. + +Leveraging the DWARF extensions as a foundation, the concept of a location +description is used as the fundamental means of recording debug information. To +support this, each LLVM entity which can be referenced by an expression has a +well-defined location description, and is referred to by expressions in an +explicit, referentially transparent manner. This makes updates to reflect +changes in the underlying LLVM representation mechanical, robust, and simple. +Due to factoring, these updates are also more localized, as updates to an +expression are transparently reflected in all dependent expressions without +having to traverse them, or even be aware of their existence. + +Without this factoring, any changes to an LLVM entity used as an input to one +or more expressions would require "macro-expansion" at the time they are made, +in each place they are referenced. This in turn inhibits the valid +transformations the context-insensitive "optimizer" can safely perform, as +perturbing the macro-expanded expression for an LLVM entity makes it impossible +to reflect future changes to that entity in the expression. Even if this is +considered acceptable, once expressions begin to depend on other expressions +(for example, in the description of induction variables, where one program +object depends on multiple other program objects) there is no longer a bound on +the recursive depth of expressions which need to be visited for any given +update, making even simple updates expensive in terms of compiler resources. +Furthermore, this approach requires either a combinatorial explosion of +expressions to describe cases when the live ranges of multiple program objects +are not equal, or the dropping of debug information for all but one such +object. None of these tradeoffs were considered acceptable. + +Changes from LLVM Language Reference Manual +=========================================== + +This section describes a provisional set of changes to the :doc:`LangRef` to +support the :doc:`AMDGPUDwarfExtensionsForHeterogeneousDebugging`. It is not +currently fully implemented and is subject to change. + +.. _amdgpu-llvm-debug-external-definitions: + +External Definitions +-------------------- + +Some required concepts are defined outside of this document. We reproduce some +parts of those definitions, along with some expansion on their relationship to +this proposal and any extensions. + +Well-Formed +~~~~~~~~~~~ + +The definition of "well-formed" is the one from the :ref:`LLVM Language +Reference Manual `. + +Type +~~~~ + +The definition of "type" is the one from the :ref:`LLVM Language Reference +Manual `. + +Value +~~~~~ + +The definition of "value" is the one from the :doc:`LangRef`. + +Location Description +-------------------- + +The definitions of "location description", "single location description", and +"location storage" are the ones from the section titled +:ref:`amdgpu-dwarf-location-description` in the DWARF Extensions For +Heterogeneous Debugging. + +A location description can consist of one or more single location descriptions. +A single location description specifies a location storage and bit offset. A +location storage is a linear stream of bits with a fixed size. + +The storage encompasses memory, registers, and literal/implicit values. + +Zero or more single location descriptions may be active for a location +description at the same instruction. + +LLVM Debug Information Expressions +---------------------------------- + +*[Note: LLVM expressions derive much of their semantics from the DWARF +expressions described in the* :ref:`amdgpu-dwarf-expressions`\ *.]* + +LLVM debug information expressions ("LLVM expressions") specify a typed +location. *[Note: Unlike DWARF expressions, they cannot directly describe how to +compute a value. Instead, they are able to describe how to define an implicit +location description for a computed value.]* + +If the evaluation of an LLVM expression does not encounter an error, then it +results in exactly one pair of location description and type. + +If the evaluation of an LLVM expression encounters an error, the result is an +evaluation error. + +If an LLVM expression is not well-formed, then the result is undefined. + +The following sections detail the rules for when a LLVM expression encounters +an error or is not well-formed. + +LLVM Expression Evaluation Context +---------------------------------- + +An LLVM expression is evaluated in a context that includes the same context +elements as described in :ref:`amdgpu-dwarf-expression-evaluation-context` with +the following exceptions. The *current result kind* is not applicable as all +LLVM expressions are location descriptions. The *current object* and *initial +stack* are not applicable as LLVM expressions have no implicit inputs. + +Location Descriptions Of LLVM Entities +-------------------------------------- + +The notion of location storage is extended to include the abstract LLVM entities +of *values*, *global variables*, *stack slots*, *virtual registers*, and +*physical registers*. In each case the location storage conceptually holds the +value of the corresponding entity. + +For global variables, the location storage corresponds to the SSA value for the +address of the global variable as is the case when referenced in LLVM IR. + +In addition, an implicit address location storage kind is defined. The size of +the storage matches the size of the type for the address. The value in the +storage is only meaningful when used in its entirety by a ``DIOpDeref`` +operation, which yields a location description for the entity that the address +references. *[Note: This is a generalization to the implicit pointer location +description of DWARF 5.]* + +Location descriptions can be associated with instances of any of these location +storage kinds. + +High Level Structure +-------------------- + +Global Variable +~~~~~~~~~~~~~~~ + +The definition of "global variable" is the one from the :ref:`globalvars` with +the following addition. + +.. TODO:: + + Should this explicitly state that only zero or one such ``dbg.def`` + attachment is well formed? + +The optional ``dbg.def`` metadata attachment can be used to specify a +``DIFragment`` termed a global variable fragment. The location description of a +global variable fragment is a memory location description for a pointer to the +global variable that references it. + +If a global variable fragment is referenced by more than one global variable +``dbg.def`` field, then it is not well-formed. If a global variable fragment is +referenced by the ``object`` field of a ``DILifetime`` then it is not +well-formed. + +*[Note: Global variables in LLVM exist for the duration of the program. The +global variable fragment can be referenced by the* ``argObjects`` *field of a +computed lifetime segment to specify the location for a* ``DIGlobalVariable`` +*for that entire program duration. However, the global variable may exist in a +different location for a given part of the subprogram. This can be expressed +using bounded lifetime segments for the* ``DIGlobalVariable``\ *. If the +computed lifetime segment is specified, it only applies for the program +locations not covered by a bounded lifetime segment. If the computed lifetime +segment is not specified, and no bounded lifetime segment covers the program +location, then the* ``DIGlobalVariable`` *location is the undefined location +description for that program location. The bounded lifetime segments of a* +``DIGlobalVariable`` *can also reference the global variable fragment. This +allows the same LLVM global variable to be used for different* +``DIGlobalVariable``\ *s over different program locations.]* + +.. TODO:: + + Should there be a separate ``DIGlobalFragment`` for this since it is not + allowed to have any bounded lifetime segments referencing it? Of should a + ``DIFragment`` have a ``kind`` field that indicates if it is a ``computed``, + ``bounded``, or ``global`` fragment? + +.. + +.. TODO:: + + Should the global variable fragment be the location description of the LLVM + global variable rather than an implicit location description that is a + pointer to it? That would void needing the ``DIOpDeref`` when referencing + the global variable fragment. Seems can use ``DIOpAddrOf`` if need the + address, and all other uses need the location description of the actual LLVM + global variable. But DWARF has limitations in supporting ``DIOpAddrOf`` due + to limitations in creating implicit pointer location descriptions. + +Metadata +-------- + +Some metadata nodes below are defined as being "abstract". An abstract metadata +node exists only to abstractly specify common aspects of derived node types, +and to refer to those derived node types generally. Abstract node types cannot +be created directly. + +.. _amdgpu-llvm-debug-diobject: + +``DIObject`` +~~~~~~~~~~~~ + +A ``DIObject`` is an abstract metadata node that represents the identity of a +program object used to hold data. There are several kinds of program objects. + +``DIVariable`` +^^^^^^^^^^^^^^ + +A ``DIVariable`` is a ``DIObject``, which represents the identity of a source +language program variable or non-source language program variable. + +A non-source language program variable includes ``DIFlagArtificial`` in the +``flags`` field. + +*[Note: A non-source language program variable may be introduced by the +compiler. These may be used in expressions needed for describing debugging +information required by the debugger.]* + +*[Example: An implicit variable needed for calculating the size of a dynamically +sized array.]* + +``DIGlobalVariable`` +'''''''''''''''''''' + +A ``DIGlobalVariable`` is a ``DIVariable``, which represents the identity of a +global variable. See :ref:`DIGlobalVariable`. + +``DILocalVariable`` +''''''''''''''''''' + +A ``DILocalVariable`` is a ``DIVariable``, which represents the identity of a +local variable. See :ref:`DILocalVariable`. + +``DIFragment`` +^^^^^^^^^^^^^^ + +.. code:: llvm + + distinct !DIFragment() + +A ``DIFragment`` is a ``DIObject``, which represents the identity of a location +description that can be used as a piece of another location description. + +*[Note: Unlike a* ``DIVariable``\ *, a* ``DIFragment`` *is not named and so is +not directly exposed to the user of a debugger.]* + +*[Note: A* ``DIFragment`` *may be a piece of a* ``DIVariable`` *directly, or +indirectly by virtue of being a piece of some other* ``DIFragment``\ *.]* + +*[Note: A* ``DIFragment`` *may be introduced to factor the definition of part of +a location description shared by other location descriptions for convenience or +to permit more compact debug information.]* + +*[Note: A* ``DIFragment`` *may be introduced to allow the compiler to specify +multiple lifetime segments for the single location description referenced for a +default or type lifetime segment.]* + +*[Note: In DWARF a* ``DIFragment`` *can be represented using a* +``DW_TAG_dwarf_procedure`` *DIE.]* + +*[Example: The fragments into which SRoA splits a source language variable. The +location description of the source language variable would then use an +expression that combines the fragments appropriately.]* + +*[Example: Divergent control flow can be described by factoring information +about how to determine active lanes by lexical scope, which results in more +compact debug information.]* + +*[Note:* ``DIFragment`` *replaces using* ``DW_OP_LLVM_fragment`` *in the current +LLVM IR* ``DIExpression`` *operations. This simplifies updating expressions +which now purely describe the location description.]* + +``DICode`` +~~~~~~~~~~ + +A ``DICode`` is an abstract metadata node that represents the identity of a +program code location. There are several kinds of program code locations. + +``DILabel`` +^^^^^^^^^^^ + +A ``DILabel`` is a ``DICode``, which represents the identity of a source +language label. See :ref:`DILabel`. + +``DIExprCode`` +^^^^^^^^^^^^^^ + +.. code:: llvm + + distinct !DIExprCode() + +A ``DIExprCode`` is a ``DICode``, which represents a code location that can be +referenced by the ``argObjects`` field of a ``DILifetime`` as an argument to its +``location`` field’s ``DIExpr``. + +*[Note:* ``DIExprCode`` *does not represent a source language label and so +generates no debug information in itself. It is only used to allow a* ``DIExpr`` +*to refer to a code location address.]* + +.. _amdgpu-llvm-debug-dicompositetype: + +``DICompositeType`` +~~~~~~~~~~~~~~~~~~~ + +A ``DICompositeType`` represents the identity of a composite source program +type. See :ref:`DICompositeType`. + +For ``DICompositeType`` with a ``tag`` field of ``DW_TAG_array_type``, the +optional ``dataLocation``, ``associated``, and ``rank`` fields specify a +``DIFragment`` which is termed a type property fragment. + +If a type property fragment is referenced by the ``argObjects`` field of a +``DILifetime`` or by more than one ``DICompositeType`` field, then the metadata +is not well-formed. + +*[Note: The* ``DILifetime``\ *(s) that reference the type property fragment +specify the location description of the type property. Their* ``location`` +*field expression can use the* :ref:`amdgpu-llvm-debug-diobject` *operation to +get the location description of the instance of the composite type for which the +property is being evaluated. Their* ``argObjects`` *field can be used to specify +other* ``DIObject``\ *s if necessary.]* + +``DILifetime`` +~~~~~~~~~~~~~~ + +.. code:: llvm + + distinct !DILifetime(object: !DIObject, location: !DIExpr [, argObjects: {!DIObject,...} ] ) + +Represents a lifetime segment of a data object. A lifetime segment specifies a +location description expression, references a data object either explicitly or +implicitly, and defines when the lifetime segment applies. The location +description of a data object is defined by the, possibly empty, set of lifetime +segments that reference it. + +.. TODO:: + + Write up the fact that after LiveDebugValues this rule is amended, such that + for a bounded lifetime segment a call to ``llvm.dbg.def``/``llvm.dbg.kill`` + is local to the basic block. That is, rather than respecting control flow + `llvm.dbg.def`` extends either to exactly one ``llvm.dbg.kill`` in the same + basic block, or to the end of the basic block. + +There are two kinds of lifetime segment: + +- A *bounded lifetime segment* is one referenced by the first argument of a + call to the ``llvm.dbg.def`` or ``llvm.dbg.kill`` intrinsic. + + A bounded lifetime segment is termed active if the current program location’s + instruction is in the range covered. The call to the ``llvm.dbg.def`` + intrinsic which specifies the ``DILifetime`` is the start of the range, which + extends along all forward control flow paths until either a call to a + ``llvm.dbg.kill`` intrinsic which specifies the same ``DILifetime``, or to + the end of an exit basic block. + + If a bounded lifetime segment is not referenced by exactly one call ``D`` to + the ``llvm.dbg.def`` intrinsic, then the metadata is not well-formed. + + A bounded lifetime segment can be referenced by zero or more + ``llvm.dbg.kill`` intrinsics ``K``. If any member of ``K`` is not reachable + from ``D`` by following control flow, or if every control flow path for every + member of ``K`` passes through another member of ``K``, then the metadata is + not well-formed. + + See :ref:`amdgpu-llvm-debug-llvm-dbg-def` and + :ref:`amdgpu-llvm-debug-llvm-dbg-kill`. +- A *computed lifetime segment* is one not referenced. + +A ``DILifetime`` which does not match exactly one of the above kinds is not +well-formed. + +The required ``object`` field specifies the data object of the lifetime segment. + +The location description of a ``DIObject`` is a function of the current program +location’s instruction and the, possibly empty, set of lifetime segments with an +``object`` field that references the ``DIObject``: + +- If the ``DIObject`` is a global variable fragment, then the location + description is comprised of an implicit location description that has a + pointer value to the global variable that has a ``dbg.def`` metadata + attachment that references it. If a global variable fragment is referenced by + more than one global variable ``dbg.def`` metadata attachment or is + referenced by the ``object`` field of a ``DILifetime``, then the metadata is + not well-formed. +- Otherwise, if the current program location is defined, and any bounded + lifetime segment is active, then the location description is comprised of all + of the location descriptions of all active bounded lifetime segments. +- Otherwise, if there is a computed lifetime segment, then the location + description is comprised of the location description of the computed lifetime + segment. *[Note: A computed lifetime segment corresponds to the DWARF* + ``loclist`` *default location description.]* +- Otherwise, the location description is the undefined location description. + +*[Note: When multiple bounded lifetime segments for the same* +``DIObject`` *are active at a given instruction, it describes the +situation where an object exists simultaneously in more than one place. +For example, a variable may exist in memory and then be promoted to a +register where it is only read before being clobbered and reverting to +using the memory location. While promoted to the register, a debugger +may read from either the register or memory since they both have the +same value but must update both the register and memory if the value of +the variable needs to be changed.]* + +*[Note: A* ``DIObject`` *with no* ``DILifetime``\ *s has an undefined location +description. If the* ``argObjects`` *field of a* ``DILifetime`` *references such +a* ``DIObject`` *then the argument can be removed, and the* ``location`` +*expression updated to use the* ``DIOpConstant`` *with an* ``undef`` *value.]* + +The location description of a ``DICode`` is a single implicit location +description with a value that is the address of the start of the basic block +that contain the ``llvm.dbg.label`` intrinsic that references it. If a +``DICode`` is not referenced by exactly one call to the ``llvm.dbg.label`` +intrinsic, then the metadata is not well-formed. See +:ref:`amdgpu-llvm-debug-llvm-dbg-label`. + +The optional ``argObjects`` field specifies a tuple of zero or more input +``DIObject``\ s or ``DICode``\ s to the expression specified by the ``location`` +field. Omitting the ``argObjects`` field is equivalent to specifying it to be +the empty tuple. + +The required ``location`` field specifies the expression which evaluates to the +location description of the lifetime segment. + +*[Note: The expression may refer to an argument specified by the* ``argObjects`` +*field using the* :ref:`amdgpu-llvm-debug-dioparg` *operation and specifying its +zero-based position in the tuple.* + +*The expression of a bounded lifetime segment may refer to the LLVM entity +specified by the second argument of the call to the* ``llvm.dbg.def`` *intrinsic +that references it using the* :ref:`amdgpu-llvm-debug-diopreferrer` *operation.* + +*The expression of a lifetime segment may refer to the object instance of a type +for which a type property is being specified using the* +:ref:`amdgpu-llvm-debug-dioptypeobject` *operation.* + +*The expression of a lifetime segment may refer to a global variable in LLVM by +using the* :ref:`amdgpu-llvm-debug-dioparg` *operation to refer to a global +variable fragment referenced in the* ``argObjects`` *field.]* + +The reachable lifetime graph is the transitive closure of the graph formed by +the edges: + +- From each ``DIVariable`` (termed root nodes and also termed reachable + ``DIObject``\ s) to the ``DILifetime``\ s that reference them (termed + reachable ``DILifetime``\ s). +- From each ``DICompositeType`` (termed root nodes) to the ``DIFragment``\ s + that are referenced by the optional ``dataLocation``, ``associated``, and + ``rank`` fields (termed reachable ``DIVariable``\ s). +- From each reachable ``DILifetime`` to the ``DIObject``\ s or ``DICode``\ s + referenced by their ``argObjects`` fields (termed reachable ``DIObject``\ s + or reachable ``DICode``\ s respectively). +- From each reachable ``DIObject`` to the ``DILifetime``\ s that reference them + (termed reachable ``DILifetime``\ s). + +If the reachable lifetime graph has any cycles or if any ``DILifetime``, +``DIFragment``, or ``DIExprCode`` are not in the reachable lifetime graph, then +the metadata is not well-formed. + +*[Note: In current debug information the* ``DILifetime`` *information is part of +the debug intrinsics. A new lifetime for an object is defined by using a debug +intrinsic to start a new lifetime. This means an object can have at most one +active lifetime for any given program location. Separating the lifetime +information into a separate metadata node allows there to be multiple debug +intrinsics to begin different lifetime segments over the same program locations. +It also allows a debug intrinsic to indicate the end of the lifetime by +referencing the same lifetime as the intrinsic that started it.]* + +``DICompileUnit`` +~~~~~~~~~~~~~~~~~ + +A ``DICompileUnit`` represents the identity of source program compile unit. See +:ref:`DICompileUnit`. + +All ``DICompileUnit`` compile units are required to be referenced by the +``!llvm.dbg.cu`` named metadata node of the LLVM module. + +All ``DIGlobalVariable`` global variables of the compile unit are required to be +referenced by the ``globals`` field of the ``DICompileUnit``. + +``DISubprogram`` +~~~~~~~~~~~~~~~~ + +A ``DISubprogram`` represents the identity of source language program or +non-source language program function. See :ref:`DISubprogram`. + +A non-source language program function includes ``DIFlagArtificial`` in the +``flags`` field. + +All ``DILocalVariable`` local variables, ``DILabel`` labels, and ``DIExprCode`` +code locations of the function are required to be referenced by the +``retainedNodes`` field of the ``DISubprogram``. + +For all ``DILifetime`` computed lifetime segments that are part of the reachable +lifetime graph: + +1. If only involve ``DILocalVariable``\ s, ``DICompositeType``\ s, and bounded + lifetime segments of the same function, then are required to be referenced by + the ``retainedNodes`` field of the corresponding ``DISubprogram``. +2. Otherwise, are required to be referenced by the ``!llvm.dbg.retainedNodes`` + named metadata node of the LLVM module. + +*[Note: At the time computed lifetime segments are created, it is always well +defined if they are local to a function or are global.* + +*For example, a computed lifetime segment created only to define the location of +a local variable (or a piece of a local variable), would be retained by the +function that defines the local variable. If the function were deleted there is +no need for the computed lifetime segment any more.* + +*Similarly, a computed lifetime segment that contributes a lifetime to the +location description of a global variable (or fragment of a global variable) +using only local variables (or fragments of local variables) or bounded lifetime +segments of the same function, would be retained by the function that defines +the local variables (or fragments of local variables) or owns the bounded +lifetime segments. If the function were deleted there is no need for the +computed lifetime segment any more as the local variable (or fragment of a local +variable) references would need to be replaced with the undefined location +description, and the bounded lifetime segments would never be active.* + +*Otherwise, the computed lifetime segment applies to a global variable (or +fragment of a global variable) and either involves other global variables (or +fragments of global variables) or local variables (or fragments of local +variables) of multiple subprograms, and therefore needs to be retained by the +LLVM module. Deleting a subprogram must not delete the computed lifetime +segment, although any references to deleted local variables (or fragments of +deleted local variables) would need to be updated to be the undefined location +description.]* + +``DIExpr`` +~~~~~~~~~~ + +.. code:: llvm + + !DIExpr(DIOp, ...) + +Represents an expression, which is a sequence of one or more operations defined +in the following sections. + +The evaluation of an expression is done in the context of an associated +``DILifetime`` that has a ``location`` field that references it. + +The evaluation of the expression is performed on an initially empty stack where +each stack element is a tuple of a type and a location description. The +expression is evaluated by evaluating each of its operations sequentially. + +The result of the evaluation is the typed location description of the single +resulting stack element. If the stack does not have a single element after +evaluation, then the expression is not well-formed. + +.. TODO:: + + Maybe operators should specify their input type(s)? It does not match what + DWARF does currently. Such types cannot trivially be used to enforce type + correctness since the expression language is an arbitrary stack, and in + general the whole expression has to be evaluated to determine the input types + to a given operation. + +Each operation definition begins with a specification which describes the +parameters to the operation, the entries it pops from the stack, and the entries +it pushes on the stack. The specification is accepted by the modified BNF +grammar in *Figure 1—LLVM IR Expression Operation Specification Syntax*, where +``[]`` denotes character classes, ``*`` denotes zero-or-more repetitions of a +term, and ``+`` denotes one-or-more repetitions of a term. + +**Figure 1—LLVM IR Expression Operation Specification Syntax** + +.. code:: bnf + + ::= + + ::= "(" ")" + ::= "" | + ::= ( ", " )+ + ::= ":" + ::= "type" | "unsigned" | "literal" | "addrspace" + + ::= "{" "->" "}" + ::= "" | + ::= ( " " )+ + ::= "(" ":" ")" + + ::= [A-Za-z]+ + ::= [A-Z] [A-Z0-9]* "'"* + +The ```` describes the LLVM IR concrete syntax of the +operation in an expression. + +The ```` defines positional parameters to the operation. +Each parameter in the list has a ```` which binds to the +argument passed via the parameter, and a ```` which +defines the kind of arguments accepted by the parameter. + +The ```` describes the kind of the parameter: + +- ``type``: An LLVM type. +- ``unsigned``: A non-negative literal integer. +- ``literal``: An LLVM literal value expression. +- ``addrspace``: An LLVM target-specific address space identifier. + +The ```` describe the effect of the operation on the +stack. The first ```` describes the "inputs" to the +operation, which are the entries it pops from the stack in the left-to-right +order. The second ```` describes the "outputs" of the +operation, which are the entries it pushes onto the stack in a right-to-left +order. In both cases the top stack element comes first on the left. + +If evaluation can result in a stack with fewer entries than required by an +operation, then the expression is not well-formed. + +Each ```` is a pair of ```` and +````. The ```` binds to the location description +of the stack entry. The ```` binds to the type of the stack entry and +denotes an LLVM type as defined in the :ref:`LLVM Language Reference Manual +`. + +Each ```` identifies a meta-syntactic variable, and each +```` may identify one or more meta-syntactic variables. When reading +the ``specification`` left-to-right, the first mention binds the meta-syntactic +variable to an entity, and subsequent mentions are an assertion that they are +the identical bound entity. If evaluation can result in parameters and stack +inputs that do not conform to the assertions, then the expression is not +well-formed. The assertions for stack outputs define post-conditions of the +operation output. + +The remaining body of the definition for an operation may reference the bound +meta-syntactic variable identifiers from the specification and may define +additional meta-syntactic variables following the same left-to-right binding +semantics. + +In the operation definitions, the following functions are defined: + +- ``bitsizeof(X)``: computes the size in bits of ``X``. +- ``sizeof(X)``: computes ``bitsizeof(X) * 8``. +- ``read(L, T)``: computes the value of type ``T`` obtained by retrieving + ``bitsizeof(T)`` bits from location description ``L``. If any bit of the + value retrieved is from the undefined location storage or the offset of any + bit exceeds the size of the location storage specified by any single + location description of ``L``, then the expression is not well-formed. + +.. TODO:: + + Consider defining reading undefined bits as producing an undefined location + description. This would need DWARF to adopt this model which may be necessary + as compilers support optimized code better. This would need all usage or + ``read`` to be reworded to specify result if ``read`` detects undefined bits. + +.. _amdgpu-llvm-debug-diopreferrer: + +``DIOpReferrer`` +^^^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpReferrer(T:type) + { -> (L:T) } + +``L`` is the location description of the referrer ``R`` of the associated +lifetime segment ``LS``. If ``LS`` is not a bounded lifetime segment, then the +expression is not well-formed. + +If ``bitsizeof(T)`` is not equal to ``bitsizeof(R)``, then the expression is not +well-formed. + +*[Note: The referrer for an expression is specified by the second argument to +the* ``llvm.dbg.def`` *intrinsic which defines* ``LS``\ *.]* + +.. _amdgpu-llvm-debug-dioparg: + +``DIOpArg`` +^^^^^^^^^^^ + +.. code:: llvm + + DIOpArg(N:unsigned, T:type) + { -> (L:T) } + +``L`` is the location description of the ``N``\ :sup:`th` zero-based input ``I`` +to the expression. + +If there are fewer than ``N + 1`` inputs to the expression, then the expression +is not well-formed. If ``bitsizeof(T)`` is not equal to ``bitsizeof(I)``, then +the expression is not well-formed. + +*[Note: The inputs for an expression are specified by the* ``argObjects`` *field +of the* ``DILifetime`` *being evaluated which has a* ``location`` *field that +references the expression.]* + +.. _amdgpu-llvm-debug-dioptypeobject: + +``DIOpTypeObject`` +^^^^^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpTypeObject(T:type) + { -> (L:T) } + +``LS`` is the lifetime segment associated with the expression containing +``DIOpTypeObject``. ``TPF`` is the type property fragment that is evaluating +``LS``. ``LT`` is the ``DIType`` that has a type property field ``TP`` that +references ``TPF``. ``L`` is the location description of the instance ``O`` of +an object of type ``LT`` for which the type property ``TP`` is being evaluated. +See :ref:`amdgpu-llvm-debug-dicompositetype`. + +If ``LS`` can be evaluated other than to obtain the location description of a +type property fragment, then the expression is not well-formed. *[Note: This +implies that a type property fragment cannot be referenced by the* ``argObjects`` +*field of a* ``DILifetime``\ *.]* If ``bitsizeof(T)`` is not equal to +``bitsizeof(LT)``, then the expression is not well-formed. + +.. TODO:: + + Should a distinguished ``DIFragment`` be used for this like for LLVM global + variables? There could be a uniqued type object fragment referenced by the + ``!llvm.dbg.typeObject`` named metadata node of the LLVM module. + +``DIOpConstant`` +^^^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpConstant(T:type V:literal) + { -> (L:T) } + +``V`` is a literal value of type ``T`` or the ``undef`` value. + +If ``V`` is the ``undef`` value, then ``L`` comprises one undefined location +description ``IL``. + +Otherwise, ``L`` comprises one implicit location description ``IL``. ``IL`` +specifies implicit location storage ``ILS`` and offset 0. ``ILS`` has value +``V`` and size ``bitsizeof(T)``. + +``DIOpConvert`` +^^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpConvert(T':type) + { (L:T) -> (L':T') } + +``L'`` comprises one implicit location description ``IL``. ``IL`` specifies +implicit location storage ``ILS`` and offset 0. ``ILS`` has value ``V`` and size +``bitsizeof(T')``. + +``V`` is the value ``read(L, T)`` converted to type ``T'``. + +*[Note: The conversions used should be limited to those supported by the target +debug format. For example, when the target debug format is DWARF, the +conversions used should be limited to those supported by the* ``DW_OP_convert`` +*operation.]* + +``DIOpReinterpret`` +^^^^^^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpReinterpret(T':type) + { (L:T) -> (L:T') } + +If ``bitsizeof(T)`` is not equal to ``bitsizeof(T')``, then the expression is +not well-formed. + +``DIOpBitOffset`` +^^^^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpBitOffset(T':type) + { (B:I) (L:T) -> (L':T') } + +``L'`` is ``L``, but updated by adding ``read(B, I)`` to its bit offset. + +If ``I`` is not an integral type, then the expression is not well-defined. + +*[Note:* ``I`` *may be a signed or unsigned integral type.]* + +``DIOpByteOffset`` +^^^^^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpByteOffset(T':type) + { (B:I) (L:T) -> (L':T') } + +``(L':T')`` is as if ``DIOpBitOffset(T')`` was evaluated with a stack containing +``(B * 8:I) (L:T)``. + +``DIOpComposite`` +^^^^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpComposite(N:unsigned, T:type) + { (LN:TN) (LN-1:TN-1) ... (L1:T1) -> (L:T) } + +*[Note: The leftmost element of the input stack-list binds to the top stack +entry. In this case,* ``(LN:TN)`` *binds to the top stack entry.]* + +``L`` comprises one complete composite location description ``CL`` with offset +0. The location storage associated with ``CL`` is comprised of ``N`` parts each +of bit size ``bitsizeof(TM)`` starting at the location storage specified by +``LM``. The parts are concatenated with no intervening padding starting at +offset 0 in order with ``M`` going from 1 to ``N``. + +If the sum of ``bitsizeof(TM)`` for ``M`` from 1 to ``N`` does not equal +``bitsizeof(T)``, then the expression is not well-formed. + +*[Note: As an example, the location storage associated with the composite +location description created by the expression* ``DIOpConstant(i8 0), +DIOpConstant(i8 1), DIOpComposite(2, i16)`` *comprises 2 bytes, with the first +byte being set to 0 and the second byte set to 1.]* + +If there are multiple parts that ultimately, after expanding referenced +composites, refer to the same bits of a non-implicit location storage, then the +expression in not well-formed. + +*[Note: A debugger could not in general assign a value to such a composite +location description as different parts of the assigned value may have different +values but map to different parts of the composite location description that are +associated with same bits of a location storage. Any given bits of location +storage can only hold a single value at a time. An implicit location description +does not permit assignment, and so the same bits of its value can be present in +multiple parts of a composite location description.]* + +``DIOpExtend`` +^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpExtend(N:unsigned) + { (L:T) -> (L':) } + +``(L':)'`` is as if ``DIOpComposite(N, )`` was applied to a stack +containing ``N`` copies of ``(L:T)``. + +If ``T`` is not an integral type, floating point type, or pointer type, then the +expression is not well-formed. + +``DIOpSelect`` +^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpSelect() + { (LM:TM) (L1:) (L0:) -> (L:) } + +``M`` is a bit mask with the value ``read(LM, TM)``. If ``bitsizeof(TM)`` is +less than ``N``, then the expression is not well-formed. + +``(L:)`` is as if ``DIOpComposite(N, )`` was applied to a stack +containing ``N`` entries ``(LI:T)`` ordered in descending ``I`` from ``N - 1`` +to 0 inclusive. Each ``LI`` is as if ``DIOpBitOffset(T)`` was applied to a stack +containing ``(I * bitsizeof(T):TI) (PLI:T)``. ``PLI`` is the same as ``L0`` if +the ``I``\ :sup:`th` least significant bit of ``M`` is zero, otherwise it is the +same as ``L1``. ``TI`` is some integral type that can represent the range 0 to +``(N - 1) * bitsizeof(T)``. + +If ``T`` is not an integral type, floating point type, or pointer type, then the +expression is not well-formed. + +.. _amdgpu-llvm-debug-diopaddrof: + +``DIOpAddrOf`` +^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpAddrOf(N:addrspace) + { (L:T) -> (L':ptr addrspace(N)) } + +``L'`` comprises one implicit address location description ``IAL``. ``IAL`` +specifies implicit address location storage ``IALS`` and offset 0. + +``IALS`` is ``bitsizeof(ptr addrspace(N))`` bits and conceptually holds a +reference to the storage that ``L`` denotes. If ``DIOpDeref(T)`` is applied to +the resulting ``(L':ptr addrspace(N))``, then it will result in ``(L:T)``. If +any other operation is applied, then the expression is not well-formed. + +*[Note:* ``DIOpAddrOf`` *can be used for any location description kind of* +``L``\ *, not just memory location descriptions.]* + +*[Note: DWARF only supports creating implicit pointer location descriptors for +variables or DWARF procedures. It does not support creating them for an +arbitrary location description expression. The examples below cover the current +LLVM optimizations and only use* ``DIOpAddrOf`` *applied to* ``DIOpReferrer``\ +*,* ``DIOpArg``\ *, and* ``DIOpConstant``\ *. All these cases can map onto +existing DWARF in a straightforward manner. There would be more complexity if* +``DIOpAddrOf`` *was used in other situations. Such usage could either be +addressed by dropping debug information as LLVM currently does in numerous +situations, or by adding additional DWARF extensions.]* + +``DIOpDeref`` +^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpDeref(T:type) + { (L:ptr addrspace(N)) -> (L':T) } + +If ``(L:ptr addrspace(N))`` was produced by a ``DIOpAddrOf`` operation, then +see :ref:`amdgpu-llvm-debug-diopaddrof`:. + +Otherwise, ``L'`` comprises one memory location description ``MLD``. ``MLD`` +specifies bit offset ``read(L, ptr addrspace(N)) * 8`` and the memory location +storage corresponding to address space ``N``. + +*[Note: This operation is not related to the DWARF operation of the same name,* +``DW_OP_deref``\ *. This operation instead borrows its name from the +"dereference operator"* ``\*`` *in C, with which it shares very similar +semantics. The DWARF operation instead corresponds to* ``DIOpRead`` *below.]* + +``DIOpRead`` +^^^^^^^^^^^^ + +.. code:: llvm + + DIOpRead() + { (L:T) -> (L':T) } + +``L'`` comprises one implicit location description ``IL``. ``IL`` specifies +implicit location storage ``ILS`` and offset 0. ``ILS`` has value ``read(L, T)`` +and size ``bitsizeof(T)``. + +``DIOpAdd`` +^^^^^^^^^^^ + +.. code:: llvm + + DIOpAdd() + { (L1:T) (L2:T) -> (L:T) } + +``L`` comprises one implicit location description ``IL``. ``IL`` specifies +implicit location storage ``ILS`` and offset 0. ``ILS`` has value ``read(L1, T) ++ read(L2, T)`` and size ``bitsizeof(T)``. + +``DIOpSub`` +^^^^^^^^^^^ + +.. code:: llvm + + DIOpSub() + { (L1:T) (L2:T) -> (L:T) } + +``L`` comprises one implicit location description ``IL``. ``IL`` specifies +implicit location storage ``ILS`` and offset 0. ``ILS`` has value ``read(V2, T) +- read(V1, T)`` and size ``bitsizeof(T)``. + +``DIOpMul`` +^^^^^^^^^^^ + +.. code:: llvm + + DIOpMul() + { (L1:T) (L2:T) -> (L:T) } + +``L`` comprises one implicit location description ``IL``. ``IL`` specifies +implicit location storage ``ILS`` and offset 0. ``ILS`` has value ``read(V2, T) +* read(V1, T)`` and size ``bitsizeof(T)``. + +``DIOpDiv`` +^^^^^^^^^^^ + +.. code:: llvm + + DIOpDiv() + { (L1:T) (L2:T) -> (L:T) } + +``L`` comprises one implicit location description ``IL``. ``IL`` specifies +implicit location storage ``ILS`` and offset 0. ``ILS`` has value ``read(V2, T) +/ read(V1, T)`` and size ``bitsizeof(T)``. + +``DIOpShr`` +^^^^^^^^^^^ + +.. code:: llvm + + DIOpShr() + { (L1:T) (L2:T) -> (L:T) } + +``L`` comprises one implicit location description ``IL``. ``IL`` specifies +implicit location storage ``ILS`` and offset 0. ``ILS`` has value ``read(V2, T) +>> read(V1, t)`` and size ``bitsizeof(T)``. If ``T`` is an unsigned integral +type, then the result is filled with 0 bits. If ``T`` is a signed integral type, +then the result is filled with the sign bit of ``V1``. + +If ``T`` is not an integral type, then the expression is not well-formed. + +``DIOpShl`` +^^^^^^^^^^^ + +.. code:: llvm + + DIOpShl() + { (L1:T) (L2:T) -> (L:T) } + +``L`` comprises one implicit location description ``IL``. ``IL`` specifies +implicit location storage ``ILS`` and offset 0. ``ILS`` has value ``read(V2, T) +<< read(V1, T)`` and size ``bitsizeof(T)``. The result is filled with 0 bits. + +If ``T`` is not an integral type, then the expression is not well-formed. + +``DIOpPushLane`` +^^^^^^^^^^^^^^^^ + +.. code:: llvm + + DIOpPushLane(T:type) + { -> (L:T) } + +``L`` comprises one implicit location description ``IL``. ``IL`` specifies +implicit location storage ``ILS`` and offset 0. ``ILS`` has the value of the +target architecture lane identifier of the current source language thread of +execution if the source language is implemented using a SIMD or SIMT execution +model. + +If ``T`` is not an integral type or the source language is not implemented using +a SIMD or SIMT execution model, then the expression is not well-formed. + +Intrinsics +---------- + +The intrinsics define the program location range over which the location +description specified by a bounded lifetime segment of a ``DILifetime`` is +active. They support defining a single or multiple locations for a source +program variable. Multiple locations can be active at the same program location +as supported by :ref:`amdgpu-dwarf-location-list-expressions`. + +.. _amdgpu-llvm-debug-llvm-dbg-def: + +``llvm.dbg.def`` +~~~~~~~~~~~~~~~~ + +.. code:: llvm + + void @llvm.dbg.def(metadata, metadata) + +The first argument to ``llvm.dbg.def`` is required to be a ``DILifetime`` and is +the beginning of the bounded lifetime being defined. + +The second argument to ``llvm.dbg.def`` is required to be a value-as-metadata +and defines the LLVM entity acting as the referrer of the bounded lifetime +segment specified by the first argument. A value of ``undef`` is allowed and +specifies the undefined location description. + +*[Note:* ``undef`` *can be used when the lifetime segment expression does not +use a* ``DIOpReferrer`` *operation, either because the expression evaluates to a +constant implicit location description, or because it only uses* ``DIOpArg`` +*operations for inputs.]* + +The MC pseudo instruction equivalent is ``DBG_DEF`` which has the same two +arguments with the same meaning: + +.. code:: llvm + + DBG_DEF metadata, + +.. _amdgpu-llvm-debug-llvm-dbg-kill: + +``llvm.dbg.kill`` +~~~~~~~~~~~~~~~~~ + +.. code:: llvm + + void @llvm.dbg.kill(metadata) + +The argument to ``llvm.dbg.kill`` is required to be a ``DILifetime`` and is the +end of the lifetime being killed. + +Every call to the ``llvm.dbg.kill`` intrinsic is required to be reachable from a +call to the ``llvm.dbg.def`` intrinsic which specifies the same ``DILifetime``, +otherwise it is not well-formed. + +The MC pseudo instruction equivalent is ``DBG_KILL`` which has the same argument +with the same meaning: + +.. code:: llvm + + DBG_KILL metadata + +.. _amdgpu-llvm-debug-llvm-dbg-label: + +``llvm.dbg.label`` +~~~~~~~~~~~~~~~~~~ + +.. code:: llvm + + void @llvm.dbg.label(metadata) + +The argument to ``llvm.dbg.label`` is required to be a ``DICode`` and defines +its address value to be the code address of the start of the basic block that +contains it. + +The MC pseudo instruction equivalent is ``DBG_LABEL`` which has the same +argument with the same meaning: + +.. code:: llvm + + DBG_LABEL metadata + +Examples +======== + +Examples which need meta-syntactic variables prefix them with a sigil to +concisely give context. The prefix sigils are: + +========= ======================================================== +**Sigil** **Meaning** +========= ======================================================== +% SSA IR Value +$ Non-SSA MIR Register (for example, post phi-elimination) +# Arbitrary literal constant +========= ======================================================== + +The syntax used in the examples attempts to match LLVM IR/MIR as closely as +possible, with the only new syntax required being that of the expression +language. + +Variable Located In An ``alloca`` +--------------------------------- + +The frontend will generate ``alloca``\ s for every variable, and can trivially +insert a single ``DILifetime`` covering the whole body of the function, with +the expression ``DIExpr(DIOpReferrer(ptr addrspace(#stack)), +DIOpDeref()``, referring to the ``alloca``. Walking the debug intrinsics +provides the necessary information to generate the DWARF ``DW_AT_location`` +attributes on variables. + +.. code:: llvm + :number-lines: + + %x.addr = alloca i64, addrspace(5) + call void @llvm.dbg.def(metadata !2, metadata ptr addrspace(5) %x.addr) + store i64 ..., ptr addrspace(5) %x.addr + ... + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i64))) + +Variable Promoted To An SSA Register +------------------------------------ + +The promotion semantically removes one level of indirection, and correspondingly +in the debug expressions for which the ``alloca`` being replaced was the +referrer, an additional ``DIOpAddrOf(N)`` is needed. + +An example is ``mem2reg`` where an ``alloca`` can be replaced with an SSA value +such that the following: + +.. code:: llvm + :number-lines: + + %x.addr = alloca i64, addrspace(5) + call void @llvm.dbg.def(metadata !2, metadata ptr addrspace(5) %x.addr) + store i64 ..., ptr addrspace(5) %x.addr + ... + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i64))) + +Now becomes: + +.. code:: llvm + :number-lines: + + %x = i64 ... + call void @llvm.dbg.def(metadata !2, metadata i64 %x) + ... + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i64), DIOpAddrOf(5), DIOpDeref(i64))) + +The canonical form of this is then just ``DIOpReferrer(i64)`` as the pair of +``DIOpAddrOf(N)``, ``DIOpDeref(i64)`` cancel out: + +.. code:: llvm + :number-lines: + + %x = i64 ... + call void @llvm.dbg.def(metadata !2, metadata i64 %x) + ... + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i64))) + +Implicit Pointer Location Description +------------------------------------- + +The transformation for removing a level of indirection is to add an +``DIOpAddrOf(N)``, which may result in a location description for a pointer to a +non-memory object. + +.. code:: c + :number-lines: + + int x = ...; + int *p = &x; + return *p; + +.. code:: llvm + :number-lines: + + %x.addr = alloca i64, addrspace(5) + call void @llvm.dbg.def(metadata !2, metadata ptr addrspace(5) %x.addr) + store ptr addrspace(5) %x.addr, i64 ... + %p.addr = alloca ptr, addrspace(5) + call void @llvm.dbg.def(metadata !4, metadata ptr addrspace(5) %p.addr) + store ptr addrspace(5) %x.addr, ptr addrspace(5) %p.addr + %0 = load ptr addrspace(5), ptr addrspace(5) %p.addr + %1 = load i64, ptr addrspace(5) %0 + ret i64 %1 + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i64))) + !3 = !DILocalVariable("p", ...) + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(ptr addrspace(5)))) + +*[Note: The* ``llvm.dbg.def`` *could either be placed after the* ``alloca`` *or +after the* ``store`` *that defines the variables initial value. The difference +is whether the debugger will be able to allow the user to access the variable +before it is initialized. Proposals exist to allow the compiler to communicate +when a variable is uninitialized separately from defining its location.]* + +First round of ``mem2reg`` promotes ``%p.addr`` to an SSA register ``%p``: + +.. code:: llvm + :number-lines: + + %x.addr = alloca i64, addrspace(5) + store i64 ..., ptr addrspace(5) %x.addr + call void @llvm.dbg.def(metadata !2, metadata ptr addrspace(5) %x.addr) + %p = ptr addrspace(5) %x.addr + call void @llvm.dbg.def(metadata !4, metadata ptr addrspace(5) %p) + %0 = load i64, ptr addrspace(5) %p + return i64 %0 + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i64))) + !3 = !DILocalVariable("p", ...) + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpAddrOf(5), DIOpDeref(ptr addrspace(5)))) + +Collapsing ``DIOpAddrOf(5), DIOpDeref(ptr addrspace(5))``: + +.. code:: llvm + :number-lines: + + %x.addr = alloca i64, addrspace(5) + store i64 ..., ptr addrspace(5) %x.addr + call void @llvm.dbg.def(metadata !2, metadata ptr addrspace(5) %x.addr) + %p = ptr addrspace(5) %x.addr + call void @llvm.dbg.def(metadata !4, metadata ptr addrspace(5) %p) + %0 = load i64, ptr addrspace(5) %p + return i64 %0 + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i64))) + !3 = !DILocalVariable("p", ...) + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(ptr addrspace(5)))) + + +Simplify by eliminating ``%p`` and directly using ``%x.addr``: + +.. code:: llvm + :number-lines: + + %x.addr = alloca i64, addrspace(5) + store i64 ..., ptr addrspace(5) %x.addr + call void @llvm.dbg.def(metadata !2, metadata ptr addrspace(5) %x.addr) + call void @llvm.dbg.def(metadata !4, metadata ptr addrspace(5) %x.addr) + %0 = load i64, ptr addrspace(5) %x.addr + return i64 %0 + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i64))) + !3 = !DILocalVariable("p", ...) + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(ptr addrspace(5)))) + +Second round of ``mem2reg`` promotes ``%x.addr`` to an SSA register ``%x``: + +.. code:: llvm + :number-lines: + + %x = i64 ... + call void @llvm.dbg.def(metadata !2, metadata i64 %x) + call void @llvm.dbg.def(metadata !4, metadata i64 %x) + %0 = i64 %x + return i64 %0 + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i64), DIOpAddrOf(5), DIOpDeref(i64))) + !3 = !DILocalVariable("p", ...) + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(i64), DIOpAddrOf(5))) + +Simplify by collapsing ``DIOpAddrOf(5), DIOpDeref(i64)`` and using ``%x`` +directly in the ``return``: + +.. code:: llvm + :number-lines: + + %x = i64 ... + call void @llvm.dbg.def(metadata !2, metadata i64 %x) + call void @llvm.dbg.def(metadata !4, metadata i64 %x) + return i64 %x + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i64))) + !3 = !DILocalVariable("p", ...) + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(i64), DIOpAddrOf(5))) + +If ``%x`` is being assigned a constant, constant propagation will eliminate +``%x`` entirely and substitute all uses with the constant: + +.. code:: llvm + :number-lines: + + call void @llvm.dbg.def(metadata !2, metadata i1 undef) + call void @llvm.dbg.def(metadata !4, metadata i1 undef) + return i64 ... + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpConstant(i64 ...))) + !3 = !DILocalVariable("p", ...) + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpConstant(i64 ...), DIOpAddrOf(5))) + +.. _amdgpu-llvm-debug-local-variable-broken-into-two-scalars: + +Local Variable Broken Into Two Scalars +-------------------------------------- + +When a transformation decomposes one location into multiple distinct ones, it +needs to follow all ``llvm.dbg.def`` intrinsics to the ``DILifetime``\ s +referencing the original location and update the expression and positional +arguments such that: + +- All instances of ``DIOpReferrer()`` in the original expression are replaced + with the appropriate composition of all the new location pieces, now encoded + via multiple ``DIOpArg()`` operations referring to input ``DIObject``\ s, + and a ``DIOpComposite()`` operation. This makes the associated + ``DILifetime`` a computed lifetime segment. +- Those location pieces are represented by new ``DIFragment``\ s, one per new + location, each with appropriate ``DILifetime``\ s referenced by new + ``llvm.dbg.def`` and ``llvm.dbg.kill`` intrinsics. + +It is assumed that any pass capable of doing the decomposition in the first +place needs to have all of this information available, and the structure of the +new intrinsics and metadata avoids any costly operations during +transformations. This update is also "shallow", in that only the ``DILifetime`` +which is immediately referenced by the relevant ``llvm.dbg.def``\ s need to be +updated, as the result is referentially transparent to any other dependent +``DILifetime``\ s. + +.. code:: llvm + :number-lines: + + %x = i64 ... + call void @llvm.dbg.def(metadata !2, metadata i64 %x) + ... + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i64))) + +Decomposing the ``i64 %x`` SSA value into two ``i32`` SSA values: + +.. code:: llvm + :number-lines: + + %x.lo = i32 ... + call void @llvm.dbg.def(metadata !4, metadata i32 %x.lo) + ... + %x.hi = i32 ... + call void @llvm.dbg.def(metadata !6, metadata i32 %x.hi) + ... + call void @llvm.dbg.kill(metadata !6) + call void @llvm.dbg.kill(metadata !4) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpArg(0, i32), DIOpArg(1, i32), DIOpComposite(2, i64)), argObjects: {!3, !5}) + !3 = distinct !DIFragment() + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(i32))) + !5 = distinct !DIFragment() + !6 = distinct !DILifetime(object: !5, location: !DIExpr(DIOpReferrer(i32))) + +Further Decomposition Of An Already SRoA’d Local Variable +--------------------------------------------------------- + +An example to demonstrate the "shallow update" property is to take the IR from +:ref:`amdgpu-llvm-debug-local-variable-broken-into-two-scalars`: + +.. code:: llvm + :number-lines: + + %x.lo = i32 ... + call void @llvm.dbg.def(metadata !4, metadata i32 %x.lo) + ... + %x.hi = i32 ... + call void @llvm.dbg.def(metadata !6, metadata i32 %x.hi) + ... + call void @llvm.dbg.kill(metadata !6) + call void @llvm.dbg.kill(metadata !4) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpArg(0, i32), DIOpArg(1, i32), DIOpComposite(2, i64)), argObjects: {!3, !5}) + !3 = distinct !DIFragment() + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(i32))) + !5 = distinct !DIFragment() + !6 = distinct !DILifetime(object: !5, location: !DIExpr(DIOpReferrer(i32))) + +And subdivide ``%x.hi`` again: + +.. code:: llvm + :number-lines: + + %x.lo = i32 ... + call void @llvm.dbg.def(metadata !4, metadata i32 %x.lo) + %x.hi.lo = i16 ... + call void @llvm.dbg.def(metadata !8, metadata i16 %x.hi.lo) + %x.hi.hi = i16 ... + call void @llvm.dbg.def(metadata !10, metadata i16 %x.hi.hi) + ... + call void @llvm.dbg.kill(metadata !10) + call void @llvm.dbg.kill(metadata !8) + call void @llvm.dbg.kill(metadata !4) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpArg(0, i32), DIOpArg(1, i32), DIOpComposite(2, i64)), argObjects: {!3, !5}) + !3 = distinct !DIFragment() + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(i32))) + !5 = distinct !DIFragment() + !6 = distinct !DILifetime(object: !5, location: !DIExpr(DIOpArg(0, i16), DIOpArg(1, i16), DIOpComposite(2, i32)), argObjects: {!7, !9}) + !7 = distinct !DIFragment() + !8 = distinct !DILifetime(object: !7, location: !DIExpr(DIOpReferrer(i16))) + !9 = distinct !DIFragment() + !10 = distinct !DILifetime(object: !9, location: !DIExpr(DIOpReferrer(i16))) + +Note that the expression for the original source variable ``x`` did not need to +be changed, as it is defined in terms of the ``DIFragment``, the identity of +which is not changed after it is created. + +Local Variable In Alloca Broken Into Two ``alloca``\ s +------------------------------------------------------ + +Similar to the case described in +:ref:`amdgpu-llvm-debug-local-variable-broken-into-two-scalars`, when an +``alloca`` is decomposed into two ``alloca``\ s all instances of +``DIOpReferrer()`` need to be replaced with a composition of the new +``alloca``\ s, but in this case an additional ``DIOpAddrOf()`` is required to +reflect the fact that there is no direct representation in LLVM remaining for +the pointer to the composite. In a situation where that pointer was only used +as input to a ``DIOpDeref()`` it can be collapsed away. + +Consider the initial program: + +.. code:: llvm + :number-lines: + + %x.addr = alloca i64, addrspace(5) + call void @llvm.dbg.def(metadata !2, metadata ptr addrspace(5) %x.addr) + ... + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i64))) + +Decomposing the ``alloca i64`` into two ``alloca i32``: + +.. code:: llvm + :number-lines: + + %x.lo.addr = alloca i32, addrspace(5) + call void @llvm.dbg.def(metadata !4, metadata ptr addrspace(5) %x.lo.addr) + ... + %x.hi.addr = alloca i32, addrspace(5) + call void @llvm.dbg.def(metadata !6, metadata ptr addrspace(5) %x.hi.addr) + ... + call void @llvm.dbg.kill(metadata !6) + call void @llvm.dbg.kill(metadata !4) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpArg(0, i32), DIOpArg(1, i32), DIOpComposite(2, i64), DIOpAddrOf(5), DIOpDeref(i64)), argObjects: {!3, !5}) + !3 = distinct !DIFragment() + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i32))) + !5 = distinct !DIFragment() + !6 = distinct !DILifetime(object: !5, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i32))) + +Simplify by collapsing ``DIOpAddrOf(5), DIOpDeref(i64)``: + +.. code:: llvm + :number-lines: + + %x.lo.addr = alloca i32, addrspace(5) + call void @llvm.dbg.def(metadata !4, metadata ptr addrspace(5) %x.lo.addr) + ... + %x.hi.addr = alloca i32, addrspace(5) + call void @llvm.dbg.def(metadata !6, metadata ptr addrspace(5) %x.hi.addr) + ... + call void @llvm.dbg.kill(metadata !6) + call void @llvm.dbg.kill(metadata !4) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpArg(0, i32), DIOpArg(1, i32), DIOpComposite(2, i64)), argObjects: {!3, !5}) + !3 = distinct !DIFragment() + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i32))) + !5 = distinct !DIFragment() + !6 = distinct !DILifetime(object: !5, location: !DIExpr(DIOpReferrer(ptr addrspace(5)), DIOpDeref(i32))) + +Note that, equivalently, we could represent the intermediate case by exposing +the pointer for each ``alloca i32``: + +.. code:: llvm + :number-lines: + + %x.lo.addr = alloca i32, addrspace(5) + call void @llvm.dbg.def(metadata !4, metadata ptr addrspace(5) %x.lo.addr) + ... + %x.hi.addr = alloca i32, addrspace(5) + call void @llvm.dbg.def(metadata !6, metadata ptr addrspace(5) %x.hi.addr) + ... + call void @llvm.dbg.kill(metadata !6) + call void @llvm.dbg.kill(metadata !4) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpArg(0, ptr addrspace(5)), DIOpDeref(i32), DIOpArg(1, ptr addrspace(5)), DIOpDeref(i32), DIOpComposite(2, i64), DIOpAddrOf(5), DIOpDeref(i64)), argObjects: {!3, !5}) + !3 = distinct !DIFragment() + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(ptr addrspace(5)))) + !5 = distinct !DIFragment() + !6 = distinct !DILifetime(object: !5, location: !DIExpr(DIOpReferrer(ptr addrspace(5)))) + +The former approach may be slightly preferrable as it requires less storage, +because the two copies of ``DIOpDeref(i32)`` are shared across multiple +references to a uniqued expression, rather than appearing sequentially in a +single expression. + +.. TODO:: + + Are there situations where pushing the DIOpDeref into the expression with + the composite is useful? A source pointer can correspond to one of the + fragments created, and the transformation could still be valid under some + circumstances. Is this possible in today's compiler? + +Multiple Live Ranges For A Single Variable +------------------------------------------ + +Once out of SSA, or even while in SSA via memory, there may be multiple re-uses +of the same storage for different variables, and disjoint and/or overlapping +lifetimes for any single variable. This is modeled naturally by maintaining +*defs* and *kills* for these live ranges independently at, for example, +definitions and clobbers. + +.. code:: llvm + :number-lines: + + $r0 = MOV ... + DBG_DEF !2, $r0 + ... + SPILL %frame.index.0, $r0 + DBG_DEF !3, %frame.index.0 + ... + $r0 = MOV ; clobber + DBG_KILL !2 + DBG_DEF !6, $r0 + ... + $r1 = MOV ... + DBG_DEF !4, $r1 + ... + DBG_KILL !6 + DBG_KILL !4 + DBG_KILL !3 + RETURN + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i32))) + !3 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i32))) + !4 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i32))) + !5 = !DILocalVariable("y", ...) + !6 = distinct !DILifetime(object: !5, location: !DIExpr(DIOpReferrer(i32))) + +In this example, ``$r0`` is referred to by disjoint ``DILifetime``\ s for +different variables. This implies the need for intrinsics/pseudo-instructions +to define the live range, as simply referring to an LLVM entity does not +provide enough information to reconstruct the live range. + +There is also a point where multiple ``DILifetime``\ s for the same variable +are live. This is needed to accurately represent cases where, for example, a +variable lives in both a register and in memory. The current +intrinsics/pseudo-instructions do not have the notion of live ranges for source +variables, and simply throw away at least one of the true lifetimes in these +cases. + +Global Variable Broken Into Two Scalars +--------------------------------------- + +.. code:: llvm + :number-lines: + + @g = i64, addrspace(1) !dbg.def !2 + + !llvm.dbg.cu = !{!0} + !llvm.dbg.retainedNodes = !{!3} + !0 = !DICompileUnit(..., globals: !{!1}) + !1 = !DIGlobalVariable("g") + !2 = distinct DIFragment() + !3 = distinct !DILifetime( + object: !1, + location: !DIExpr(DIOpArg(0, ptr addrspace(1)), DIDeref(i64)), + argObjects: {!2} + ) + +Becomes: + +.. code:: llvm + :number-lines: + + @g.lo = i32, addrspace(1) !dbg.def !2 + @g.hi = i32, addrspace(1) !dbg.def !3 + + !llvm.dbg.cu = !{!0} + !llvm.dbg.retainedNodes = !{!4} + !0 = !DICompileUnit(..., globals: !{!1}) + !1 = !DIGlobalVariable("g") + !2 = distinct !DIFragment() + !3 = distinct !DIFragment() + !4 = distinct !DILifetime( + object: !1, + location: !DIExpr( + DIOpArg(0, ptr addrspace(1)), DIDeref(i32), + DIOpArg(1, ptr addrspace(1)), DIDeref(i32), + DIOpComposite(2, i64) + ), + argObjects: {!2, !3} + ) + +A function can specify the location of the global variable ``!1`` over some +range by simply defining bounded lifetime segments that also reference ``!1``. +These will override the "default" location description specified by the computed +lifetime segment ``!4``. + +Induction Variable +------------------ + +Starting with some program: + +.. code:: llvm + :number-lines: + + %x = i64 ... + call void @llvm.dbg.def(metadata !2, metadata i64 %x) + ... + %y = i64 ... + call void @llvm.dbg.def(metadata !4, metadata i64 %y) + ... + %i = i64 ... + call void @llvm.dbg.def(metadata !6, metadata i64 %i) + ... + call void @llvm.dbg.kill(metadata !6) + call void @llvm.dbg.kill(metadata !4) + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i64))) + !3 = !DILocalVariable("y", ...) + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(i64))) + !5 = !DILocalVariable("i", ...) + !6 = distinct !DILifetime(object: !5, location: !DIExpr(DIOpReferrer(i64))) + +If analysis proves ``i`` over some range is equal to ``x + y``, the storage for +``i`` can be eliminated, and it can be materialized at every use. The +corresponding change needed in the debug information is: + +.. code:: llvm + :number-lines: + + %x = i64 ... + call void @llvm.dbg.def(metadata !2, metadata i64 %x) + ... + %y = i64 ... + call void @llvm.dbg.def(metadata !4, metadata i64 %y) + ... + call void @llvm.dbg.def(metadata !6, metadata i64 undef) + ... + call void @llvm.dbg.kill(metadata !6) + call void @llvm.dbg.kill(metadata !4) + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i64))) + !3 = !DILocalVariable("y", ...) + !4 = distinct !DILifetime(object: !3, location: !DIExpr(DIOpReferrer(i64))) + !5 = !DILocalVariable("i", ...) + !6 = distinct !DILifetime(object: !5, location: !DIExpr(DIOpArg(0, i64), DIOpArg(1, i64), DIOpAdd()), argObjects: {!1, !3}) + +For the given range, the value of ``i`` is computable so long as both ``x`` and +``y`` are live, the determination of which is left until the backend debug +information generation (for example, for old DWARF or for other debug +information formats), or until debugger runtime when the expression is evaluated +(for example, for DWARF with ``DW_OP_call`` and ``DW_TAG_dwarf_procedure``). +During compilation, this representation allows all updates to maintain the debug +information efficiently by making updates "shallow". + +In other cases, this can allow the debugger to provide locations for part of a +source variable, even when other parts are not available. This may be the case +if a ``struct`` with many fields is broken up during SRoA and the lifetimes of +each piece diverge. + +Proven Constant +--------------- + +As a very similar example to the above induction variable case (in terms of the +updates needed in the debug information), the case where a variable is proven to +be a statically known constant over some range turns the following: + +.. code:: llvm + :number-lines: + + %x = i64 ... + call void @llvm.dbg.def(metadata !2, metadata i64 %x) + ... + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpReferrer(i64))) + +Into: + +.. code:: llvm + :number-lines: + + call void @llvm.dbg.def(metadata !2, metadata i64 undef) + ... + call void @llvm.dbg.kill(metadata !2) + + !1 = !DILocalVariable("x", ...) + !2 = distinct !DILifetime(object: !1, location: !DIExpr(DIOpConstant(i64 ...))) + +Common Subexpression Elimination (CSE) +-------------------------------------- + +This is the example from `Bug 40628 - [DebugInfo@O2] Salvaged memory loads can +observe subsequent memory writes +`__: + +.. code:: c + :number-lines: + + int + foo(int *bar, int arg, int more) + { + int redundant = *bar; + int loaded = *bar; + arg &= more + loaded; + + *bar = 0; + + return more + *bar; + } + + int + main() { + int lala = 987654; + return foo(&lala, 1, 2); + } + +Which after ``SROA+mem2reg`` becomes (where ``redundant`` is ``!17`` and +``loaded`` is ``!16``): + +.. code:: llvm + :number-lines: + + ; Function Attrs: noinline nounwind uwtable + define dso_local i32 @foo(i32* %bar, i32 %arg, i32 %more) #0 !dbg !7 { + entry: + call void @llvm.dbg.value(metadata i32* %bar, metadata !13, metadata !DIExpression()), !dbg !18 + call void @llvm.dbg.value(metadata i32 %arg, metadata !14, metadata !DIExpression()), !dbg !18 + call void @llvm.dbg.value(metadata i32 %more, metadata !15, metadata !DIExpression()), !dbg !18 + %0 = load i32, i32* %bar, align 4, !dbg !19, !tbaa !20 + call void @llvm.dbg.value(metadata i32 %0, metadata !16, metadata !DIExpression()), !dbg !18 + %1 = load i32, i32* %bar, align 4, !dbg !24, !tbaa !20 + call void @llvm.dbg.value(metadata i32 %1, metadata !17, metadata !DIExpression()), !dbg !18 + %add = add nsw i32 %more, %1, !dbg !25 + %and = and i32 %arg, %add, !dbg !26 + call void @llvm.dbg.value(metadata i32 %and, metadata !14, metadata !DIExpression()), !dbg !18 + store i32 0, i32* %bar, align 4, !dbg !27, !tbaa !20 + %2 = load i32, i32* %bar, align 4, !dbg !28, !tbaa !20 + %add1 = add nsw i32 %more, %2, !dbg !29 + ret i32 %add1, !dbg !30 + } + +And previously led to this after ``EarlyCSE``, which removes the redundant load +from ``%bar``: + +.. code:: llvm + :number-lines: + + define dso_local i32 @foo(i32* %bar, i32 %arg, i32 %more) #0 !dbg !7 { + entry: + call void @llvm.dbg.value(metadata i32* %bar, metadata !13, metadata !DIExpression()), !dbg !18 + call void @llvm.dbg.value(metadata i32 %arg, metadata !14, metadata !DIExpression()), !dbg !18 + call void @llvm.dbg.value(metadata i32 %more, metadata !15, metadata !DIExpression()), !dbg !18 + + ; This is not accurate to begin with, as a debugger which modifies + ; `redundant` will erroneously update the pointee of the parameter `bar`. + call void @llvm.dbg.value(metadata i32* %bar, metadata !16, metadata !DIExpression(DW_OP_deref)), !dbg !18 + + %0 = load i32, i32* %bar, align 4, !dbg !19, !tbaa !20 + call void @llvm.dbg.value(metadata i32 %0, metadata !17, metadata !DIExpression()), !dbg !18 + %add = add nsw i32 %more, %0, !dbg !24 + call void @llvm.dbg.value(metadata i32 undef, metadata !14, metadata !DIExpression()), !dbg !18 + + ; This store "clobbers" the debug location description for `redundant`, such + ; that a debugger about to execute the following `ret` will erroneously + ; report `redundant` as equal to `0` when the source semantics have it still + ; equal to the value pointed to by `bar` on entry. + store i32 0, i32* %bar, align 4, !dbg !25, !tbaa !20 + ret i32 %more, !dbg !26 + } + +But now becomes (conservatively): + +.. code:: llvm + :number-lines: + + define dso_local i32 @foo(i32* %bar, i32 %arg, i32 %more) #0 !dbg !7 { + entry: + call void @llvm.dbg.value(metadata i32* %bar, metadata !13, metadata !DIExpression()), !dbg !18 + call void @llvm.dbg.value(metadata i32 %arg, metadata !14, metadata !DIExpression()), !dbg !18 + call void @llvm.dbg.value(metadata i32 %more, metadata !15, metadata !DIExpression()), !dbg !18 + + ; The above mentioned patch for PR40628 adds special treatment, dropping + ; the debug information for `redundant` completely in this case, making + ; this conservatively correct. + call void @llvm.dbg.value(metadata i32 undef, metadata !16, metadata !DIExpression()), !dbg !18 + + %0 = load i32, i32* %bar, align 4, !dbg !19, !tbaa !20 + call void @llvm.dbg.value(metadata i32 %0, metadata !17, metadata !DIExpression()), !dbg !18 + %add = add nsw i32 %more, %0, !dbg !24 + call void @llvm.dbg.value(metadata i32 undef, metadata !14, metadata !DIExpression()), !dbg !18 + store i32 0, i32* %bar, align 4, !dbg !25, !tbaa !20 + ret i32 %more, !dbg !26 + } + +Effectively at the point of the CSE eliminating the load, it conservatively +marks the source variable ``redundant`` as optimized out. + +It seems like the semantics that CSE really wants to encode in the debug +intrinsics is that, after the point at which the common load occurs, the +location for both ``redundant`` and ``loaded`` is ``%0``, and that they are both +read-only. It seems like it needs to prove this to combine them, and if it can +only combine them over some range, it can insert additional live ranges to +describe their separate locations outside of that range. The implicit pointer +example further suggests why this may need to be the case, because at the time +the implicit pointer is created, it is not known which source variable to bind +to in order to get the multiple lifetimes in this design. + +This seems to be supported by the fact that even in current LLVM trunk, with the +more conservative change to mark the ``redundant`` variable as ``undef`` in the +above case, changing the source to modify ``redundant`` after the load results +in both ``redundant`` and ``loaded`` referring to the same location, and both +being read-write. A modification of ``redundant`` in the debugger before the use +of ``loaded`` is permitted and would have the effect of also updating +``loaded``. An example of the modified source needed to cause this is: + +.. code:: c + :number-lines: + + int + foo(int *bar, int arg, int more) + { + int redundant = *bar; + int loaded = *bar; + arg &= more + loaded; // A store to redundant here affects loaded. + + *bar = redundant; // The use and subsequent modification of `redundant` here + redundant = 1; // effectively circumvents the patch for PR40628. + + return more + *bar; + } + + int + main() { + int lala = 987654; + return foo(&lala, 1, 2); + } + +Note that after ``EarlyCSE``, this example produces the same location +description for both ``redundant`` and ``loaded`` (metadata ``!17`` and +``!18``): + +.. code:: llvm + :number-lines: + + define dso_local i32 @foo(i32* %bar, i32 %arg, i32 %more) #0 !dbg !8 { + entry: + call void @llvm.dbg.value(metadata i32* %bar, metadata !14, metadata !DIExpression()), !dbg !19 + call void @llvm.dbg.value(metadata i32 %arg, metadata !15, metadata !DIExpression()), !dbg !19 + call void @llvm.dbg.value(metadata i32 %more, metadata !16, metadata !DIExpression()), !dbg !19 + %0 = load i32, i32* %bar, align 4, !dbg !20, !tbaa !21 + + ; The same location is reused for both source variables, without it being + ; marked read-only (namely without it being made into an implicit location + ; description). + call void @llvm.dbg.value(metadata i32 %0, metadata !17, metadata !DIExpression()), !dbg !19 + call void @llvm.dbg.value(metadata i32 %0, metadata !18, metadata !DIExpression()), !dbg !19 + + ; Modifications to either source variable in a debugger affect the other from + ; this point on in the function. + %add = add nsw i32 %more, %0, !dbg !25 + call void @llvm.dbg.value(metadata i32 undef, metadata !15, metadata !DIExpression()), !dbg !19 + call void @llvm.dbg.value(metadata i32 1, metadata !17, metadata !DIExpression()), !dbg !19 + ret i32 %add, !dbg !26 + } + +*[Note: To see this result, i386 is required; x86_64 seems to do even more +optimization which eliminates both* ``loaded`` *and* ``redundant``\ *.]* + +Fixing this issue in the current debug information is technically possible, but +as noted by the LLVM community in the review for the attempted conservative +patch: + + *"this isn’t something that can be fixed without a lot of work, thus it’s + safer to turn off for now."* + +The LLVM extensions make this case tractable to support with full generality and +composability with other optimizations. The expected result of ``EarlyCSE`` +would be: + +.. code:: llvm + :number-lines: + + define dso_local i32 @foo(i32* %bar, i32 %arg, i32 %more) #0 !dbg !8 { + entry: + call void @llvm.dbg.def(metadata ptr %bar, metadata !19), !dbg !19 + call void @llvm.dbg.def(metadata i32 %arg, metadata !20), !dbg !19 + call void @llvm.dbg.def(metadata i32 %more, metadata !21), !dbg !19 + %0 = load i32, i32* %bar, align 4, !dbg !20, !tbaa !21 + + call void @llvm.dbg.def(metadata i32 %0, metadata !22), !dbg !19 + call void @llvm.dbg.def(metadata i32 %0, metadata !23), !dbg !19 + + %add = add nsw i32 %more, %0, !dbg !25 + ret i32 %add, !dbg !26 + } + + !14 = !DILocalVariable("bar", ...) + !15 = !DILocalVariable("arg", ...) + !16 = !DILocalVariable("more", ...) + !17 = !DILocalVariable("redundant", ...) + !18 = !DILocalVariable("loaded", ...) + !19 = distinct !DILifetime(object: !14, location: !DIExpr(DIOpReferrer(ptr))) + !20 = distinct !DILifetime(object: !15, location: !DIExpr(DIOpReferrer(i32))) + !21 = distinct !DILifetime(object: !16, location: !DIExpr(DIOpReferrer(i32))) + !22 = distinct !DILifetime(object: !17, location: !DIExpr(DIOpReferrer(i32), DIOpRead(i32))) + !23 = distinct !DILifetime(object: !18, location: !DIExpr(DIOpReferrer(i32), DIOpRead(i32))) + +Which accurately describes that both ``redundant`` and ``loaded`` are read-only +after the common load. + +Divergent Lane PC +----------------- + +For AMDGPU, the ``DW_AT_LLVM_lane_pc`` attribute is used to specify the program +location of the separate lanes of a SIMT thread. + +If the lane is an active lane, then this will be the same as the current program +location. + +If the lane is inactive, but was active on entry to the subprogram, then this is +the program location in the subprogram at which execution of the lane is +conceptual positioned. + +If the lane was not active on entry to the subprogram, then this will be the +undefined location. A client debugger can check if the lane is part of a valid +work-group by checking that the lane is in the range of the associated +work-group within the grid, accounting for partial work-groups. If it is not, +then the debugger can omit any information for the lane. Otherwise, the debugger +may repeatedly unwind the stack and inspect the ``DW_AT_LLVM_lane_pc`` of the +calling subprogram until it finds a non-undefined location. Conceptually the +lane only has the call frames that it has a non-undefined +``DW_AT_LLVM_lane_pc``. + +The following example illustrates how the AMDGPU backend can generate a DWARF +location list expression for the nested ``IF/THEN/ELSE`` structures of the +following subprogram pseudo code for a target with 64 lanes per wavefront. + +.. code:: llvm + :number-lines: + + SUBPROGRAM X + BEGIN + a; + IF (c1) THEN + b; + IF (c2) THEN + c; + ELSE + d; + ENDIF + e; + ELSE + f; + ENDIF + g; + END + +The AMDGPU backend may generate the following pseudo LLVM MIR to manipulate the +execution mask (``EXEC``) to linearize the control flow. The condition is +evaluated to make a mask of the lanes for which the condition evaluates to true. +First the ``THEN`` region is executed by setting the ``EXEC`` mask to the +logical ``AND`` of the current ``EXEC`` mask with the condition mask. Then the +``ELSE`` region is executed by negating the ``EXEC`` mask and logical ``AND`` of +the saved ``EXEC`` mask at the start of the region. After the ``IF/THEN/ELSE`` +region the ``EXEC`` mask is restored to the value it had at the beginning of the +region. This is shown below. Other approaches are possible, but the basic +concept is the same. + +.. code:: llvm + :number-lines: + + %lex_start: + a; + %1 = EXEC + %2 = c1 + %lex_1_start: + EXEC = %1 & %2 + $if_1_then: + b; + %3 = EXEC + %4 = c2 + %lex_1_1_start: + EXEC = %3 & %4 + %lex_1_1_then: + c; + EXEC = ~EXEC & %3 + %lex_1_1_else: + d; + EXEC = %3 + %lex_1_1_end: + e; + EXEC = ~EXEC & %1 + %lex_1_else: + f; + EXEC = %1 + %lex_1_end: + g; + %lex_end: + +To create the DWARF location list expression that defines the location +description of a vector of lane program locations, the LLVM MIR ``DBG_DEF`` +pseudo instruction can be used to annotate the linearized control flow. This can +be done by defining a ``DIFragment`` for the lane PC and using it as the +``activeLanePC`` parameter of the corresponding ``DISubprogram`` of the function +being described. The DWARF location list expression created for it is used as +the value of the ``DW_AT_LLVM_lane_pc`` attribute on the subprogram’s debugger +information entry. + +A ``DIFragment`` is defined for each well nested structured control flow region +which provides the conceptual lane program location for a lane if it is not +active (namely it is divergent). The ``DIFragment`` for each region has a single +computed ``DILifetime`` whose location expression conceptually inherits the +value of the immediately enclosing region and modifies it according to the +semantics of the region. + +By having a separate ``DIFragment`` for each region, they can be reused to +define the value for any nested region. This reduces the total size of the DWARF +operation expressions. + +A "bounded divergent lane PC" ``DIFragment`` is defined which computes the +program location for each lane assuming they are divergent at every instruction +in the function. This fragment has one bounded lifetime for each region. Each +bounded lifetime specifies a single ``DIFragment`` for a region and is active +over a disjoint range of the function instructions corresponding to that region. +Together the lifetimes cover all instructions of the function, such that at +every PC in the function exactly one lifetime is active. + +For an ``IF/THEN/ELSE`` region, the divergent program location is at the start +of the region for the ``THEN`` region since it is executed first. For the +``ELSE`` region, the divergent program location is at the end of the +``IF/THEN/ELSE`` region since the ``THEN`` region has completed. + +The lane PC fragment is then defined with an expression that takes the bounded +divergent lane PC and modifies it by inserting the current program location for +each lane that the ``EXEC`` mask indicates is active. + +The following provides an example using pseudo LLVM MIR. + +.. code:: llvm + :number-lines: + + ; NOTE: This listing is written in a pseudo LLVM MIR, as this debug information + ; will be inserted as part of inserting EXEC manipulation into LLVM MIR. + ; + ; This pseudo-MIR uses named metadata identifiers (e.g. !foo) to identify + ; unnamed metadata (e.g. !0). To translate to MIR assign each unique named + ; metadata identifier a monotonically increasing unnamed metadata identifier, + ; then replace all references to each named metadata identifier with its + ; corresponding unnamed metadata identifier. + ; + ; The identifiers are named as a dot (`.`) separated list of elements, + ; ending with a tag corresponding to the type of metadata they identify. + ; + ; In MIR a `!DIExpr` is always printed inline at its use, even though it is + ; internally uniqued and shared by all uses of the same expression. In this + ; pseudo-MIR we break this convention and write the expressions out-of-line + ; in some cases to emphasize where sharing occurs and to shorten the listing. + + lex_start: + ; NOTE: These lifetimes for the PC/EXEC registers define the typical, + ; default case of referring directly to the physical register. For cases + ; like WQM where the physical EXEC and "logical" EXEC are not the same, + ; this will be overriden by defining a bounded lifetime for + ; !pc.fragment/!exec.fragment. + DBG_DEF !pc.physical.lifetime, $PC + DBG_DEF !exec.physical.lifetime, $EXEC + DBG_DEF !bounded_divergent_lane_pc.lex.a.lifetime, $noreg + a; + %1 = EXEC; + DBG_DEF !save_exec.lex_1.lifetime, i64 %1 + %2 = c1; + DBG_KILL !bounded_divergent_lane_pc.lex.a.lifetime + lex_1_start: + DBG_LABEL !lex_1_start.label + EXEC = %1 & %2; + lex_1_then: + DBG_DEF !bounded_divergent_lane_pc.lex_1_then.a.lifetime, $noreg + b; + %3 = EXEC; + DBG_DEF !save_exec.lex_1_1.lifetime, i64 %3 + %4 = c2; + DBG_KILL !bounded_divergent_lane_pc.lex_1_then.a.lifetime + lex_1_1_start: + DBG_LABEL !lex_1_1_start.label + EXEC = %3 & %4; + lex_1_1_then: + DBG_DEF !bounded_divergent_lane_pc.lex_1_1_then.a.lifetime, $noreg + c; + DBG_KILL !bounded_divergent_lane_pc.lex_1_1_then.a.lifetime + EXEC = ~EXEC & %3; + lex_1_1_else: + DBG_DEF !bounded_divergent_lane_pc.lex_1_1_else.a.lifetime, $noreg + d; + DBG_KILL !bounded_divergent_lane_pc.lex_1_1_else.a.lifetime + EXEC = %3; + DBG_KILL !save_exec.lex_1_1.lifetime + lex_1_1_end: + DBG_LABEL !lex_1_1_end.label + DBG_DEF !bounded_divergent_lane_pc.lex_1_then.b.lifetime, $noreg + e; + DBG_KILL !bounded_divergent_lane_pc.lex_1_then.b.lifetime + EXEC = ~EXEC & %1; + lex_1_else: + DBG_DEF !bounded_divergent_lane_pc.lex_1_else.a.lifetime, $noreg + f; + DBG_KILL !bounded_divergent_lane_pc.lex_1_else.a.lifetime + EXEC = %1; + DBG_KILL !save_exec.lex_1.lifetime + lex_1_end: + DBG_LABEL !lex_1_end.label + DBG_DEF !bounded_divergent_lane_pc.lex.b.lifetime, $noreg + g; + lex_end: + + ;; Labels + !lex_1_start.label = distinct !DExprCode() + !lex_1_1_start.label = distinct !DExprCode() + !lex_1_1_end.label = distinct !DExprCode() + !lex_1_end.label = distinct !DExprCode() + + ;; Saved EXEC Mask Fragments + ; These track the value of the EXEC mask saved on entry to each `IF/THEN/ELSE` + ; region. The saved mask identifies the lanes to be updated when defining the + ; computed divergent_lane_pc for a given lexical block (or, put another way, + ; the negation of the saved mask identifies the lanes which are not updated). + !save_exec.lex_1.fragment = distinct !DIFragment() + !save_exec.lex_1.lifetime = distinct !DILifetime( + object: !save_exec.lex_1.fragment, + location: !DIExpr(DIOpReferrer(i64)) + ) + !save_exec.lex_1_1.fragment = distinct !DIFragment() + !save_exec.lex_1_1.lifetime = distinct !DILifetime( + object: !save_exec.lex_1_1.fragment, + location: !DIExpr(DIOpReferrer(i64)) + ) + + ;; Logical and Physical Register Fragments + ; NOTE: We refer to the "logical" EXEC, `!exec.fragment`, in other expressions. + ; This may be computed in cases where the physical EXEC was updated to + ; implement e.g. whole-quad-mode. Referring to this fragment makes the uses + ; transparently support this. The same approach is applied for the PC. + !pc.fragment = distinct !DIFragment() + !pc.default.lifetime = distinct !DILifetime( + object: !pc.fragment, + location: !DIExpr(DIOpArg(i64)), + argObjects: {!pc.physical.fragment} + ) + !pc.physical.fragment = distinct !DIFragment() + !pc.physical.lifetime = distinct !DILifetime( + object: !pc.physical.fragment, + location: !DIExpr(DIOpReferrer(i64)) + ) + !exec.fragment = distinct !DIFragment() + !exec.default.lifetime = distinct !DILifetime( + object: !exec.fragment, + location: !DIExpr(DIOpArg(i64)), + argObjects: {!exec.physical.fragment} + ) + !exec.physical.fragment = distinct !DIFragment() + !exec.physical.lifetime = distinct !DILifetime( + object: !exec.physical.fragment, + location: !DIExpr(DIOpReferrer(i64)) + ) + + ;; Bounded Divergent Lane PC + ; This fragment has disjoint lifetimes which cover the entire PC range of the + ; function. It contains the divergent_lane_pc for all lanes which are + ; divergent, with unspecified values present in active lanes (as an artifact of + ; the current implementation, the active lanes are assigned the same value as + ; the divergent lanes which were active on entry to the current `IF/THEN/ELSE` + ; region, but this is neither guaranteed nor required). + !bounded_divergent_lane_pc.fragment = distinct !DIFragment() + ; The argObjects to !bounded_divergent_lane_pc.expr are: + ; {<64 x i64> lane_pc_vec} + !bounded_divergent_lane_pc.expr = !DIExpr(DIOpArg(<64 x i64>)) + !bounded_divergent_lane_pc.lex.a.lifetime = distinct !DILifetime( + object: !bounded_divergent_lane_pc.fragment, + location: !bounded_divergent_lane_pc.expr, + argObjects: {!divergent_lane_pc.lex.fragment} + ) + !bounded_divergent_lane_pc.lex_1_then.a.lifetime = distinct !DILifetime( + object: !bounded_divergent_lane_pc.fragment, + location: !bounded_divergent_lane_pc.expr, + argObjects: {!divergent_lane_pc.lex_1_then.fragment} + ) + !bounded_divergent_lane_pc.lex_1_1_then.a.lifetime = distinct !DILifetime( + object: !bounded_divergent_lane_pc.fragment, + location: !bounded_divergent_lane_pc.expr, + argObjects: {!divergent_lane_pc.lex_1_1_then.fragment} + ) + !bounded_divergent_lane_pc.lex_1_1_else.a.lifetime = distinct !DILifetime( + object: !bounded_divergent_lane_pc.fragment, + location: !bounded_divergent_lane_pc.expr, + argObjects: {!divergent_lane_pc.lex_1_1_else.fragment} + ) + !bounded_divergent_lane_pc.lex_1_then.b.lifetime = distinct !DILifetime( + object: !bounded_divergent_lane_pc.fragment, + location: !bounded_divergent_lane_pc.expr, + argObjects: {!divergent_lane_pc.lex_1_then.fragment} + ) + !bounded_divergent_lane_pc.lex_1_else.a.lifetime = distinct !DILifetime( + object: !bounded_divergent_lane_pc.fragment, + location: !bounded_divergent_lane_pc.expr, + argObjects: {!divergent_lane_pc.lex_1_else.fragment} + ) + !bounded_divergent_lane_pc.lex.b.lifetime = distinct !DILifetime( + object: !bounded_divergent_lane_pc.fragment, + location: !bounded_divergent_lane_pc.expr, + argObjects: {!divergent_lane_pc.lex.fragment} + ) + + ; TODO: Maybe add a property of DIFragment that asserts it should never have + ; more than a single location description for any PC + + ; TODO: To easily translate Extend, Select, Read, etc. + ; into DWARF, they will needs a type parameter. Should we add a type to just the + ; operations which correspond to a DWARF operation that needs the type/size? Or + ; should we just add types to all operations? + + ;; Computed Divergent Lane PC Fragments + !divergent_lane_pc.lex.fragment = distinct !DIFragment() + !divergent_lane_pc.lex.lifetime = distinct !DILifetime( + object: !divergent_lane_pc_outer.fragment, + location: !DIExpr(DIOpConstant(i64 undef), DIOpExtend(64)) + ) + ; The argObjects to `!select_lanes.expr` are: + ; {<64 x i64> starting_lane_pc_vec, i64 pc_value, i64 mask} + !select_lanes.expr = !DIExpr( + DIOpArg(0, <64 x i64>), + DIOpArg(1, i64), DIOpExtend(64, i64), + DIOpArg(2, i64), + DIOpSelect(64, i64) + ) + ; TODO: We have the issue of: how do we ensure we have a value when we need + ; it for DWARF, for example DIOpSelect will need to ensure the top element of + ; the stack is a value when evaluating the final DWARF, but this violates the + ; "context insensitive" property we want for the operations. + ; We can work around this by emitting "unoptimized" DWARF where e.g. every + ; implicit location description in the LLVM representation actually maps to an + ; implicit location description being pushed on the DWARF stack (e.g. we lower + ; `... DIOpConstant(i64 42) DIOpSelect()` to `... DW_OP_uconst 42, + ; DW_OP_stack_value, DW_OP_deref, DW_OP_select_bit_piece` instead of just `... + ; DW_OP_uconst 42, DW_OP_select_bit_piece`) + !divergent_lane_pc.lex_1_then.fragment = distinct !DIFragment() + !divergent_lane_pc.lex_1_then.lifetime = distinct !DILifetime( + object: !divergent_lane_pc.lex_1_then.fragment, + location: !select_lanes.expr, + argObjects: { + !divergent_lane_pc.lex.fragment, + !lex_1_start.label, + !save_exec.lex_1.fragment + } + ) + !divergent_lane_pc.lex_1_1_then.fragment = distinct !DIFragment() + !divergent_lane_pc.lex_1_1_then.lifetime = distinct !DILifetime( + object: !divergent_lane_pc.lex_1_1_then.fragment, + location: !select_lanes.expr, + argObjects: { + !divergent_lane_pc.lex.fragment, + !lex_1_1_start.label, + !save_exec.lex_1_1.fragment + } + ) + !divergent_lane_pc.lex_1_1_else.fragment = distinct !DIFragment() + !divergent_lane_pc.lex_1_1_else.lifetime = distinct !DILifetime( + object: !divergent_lane_pc.lex_1_1_else.fragment, + location: !select_lanes.expr, + argObjects: { + !divergent_lane_pc.lex.fragment, + !lex_1_1_end.label, + !save_exec.lex_1_1.fragment + } + ) + !divergent_lane_pc.lex_1_else.fragment = distinct !DIFragment() + !divergent_lane_pc.lex_1_else.lifetime = distinct !DILifetime( + object: !divergent_lane_pc.lex_1_else.fragment, + location: !select_lanes.expr, + argObjects: { + !divergent_lane_pc.lex.fragment, + !lex_1_end.label, + !save_exec.lex_1.fragment + } + ) + + ;; Active Lane PC + !active_lane_pc.fragment = distinct !DIFragment() + !active_lane_pc.lifetime = distinct !DILifetime( + object: !active_lane_pc.fragment, + location: !select_lanes.expr, + argObjects: { + !bounded_divergent_lane_pc.fragment, + !pc.fragment, + !exec.fragment + } + ) + + ;; Subprogram + !subprogram = !DISubprogram(..., + activeLanePC: !active_lane_pc.fragment, + retainedNodes: !{ + !pc.default.lifetime, + !exec.default.lifetime, + !divergent_lane_pc.lex_1_then.lifetime, + !divergent_lane_pc.lex_1_1_then.lifetime, + !divergent_lane_pc.lex_1_1_else.lifetime, + !divergent_lane_pc.lex_1_else.lifetime, + !active_lane_pc.lifetime, + !lex_1_start.label, + !lex_1_1_start.label, + !lex_1_1_end.label, + !lex_1_end.label + } + ) + +Fragments ``!save_exec.lex_1.fragment`` and ``!save_exec.lex_1_1.fragment`` are +created for the execution masks saved on entry to a region. Using the +``DBG_DEF`` pseudo instruction, location list entries will be created that +describe where the artificial variables are allocated at any given program +location. The compiler may allocate them to registers or spill them to memory. + +The fragments for each region use the values of the saved execution mask +artificial variables to only update the lanes that are active on entry to the +region. All other lanes retain the value of the enclosing region where they were +last active. If they were not active on entry to the subprogram, then will have +the undefined location description. + +Other structured control flow regions can be handled similarly. For example, +loops would set the divergent program location for the region at the end of the +loop. Any lanes active will be in the loop, and any lanes not active must have +exited the loop. + +An ``IF/THEN/ELSEIF/ELSEIF/...`` region can be treated as a nest of +``IF/THEN/ELSE`` regions. + +Other Ideas +=========== + +Translating To DWARF +-------------------- + +.. TODO::: + + Define algorithm for computing DWARF location descriptions and loclists. + + - Define rule for implicit pointers (``DIOpAddrof`` operation applied to a + ``DIOpReferrer`` operation): + + - Look for a compatible, existing program object. + - If not, generate an artificial one. + - This could be bubbled up to DWARF itself, to allow implicits to hold + arbitrary location descriptions, eliminating the need for the + artificial variable, and make translation simpler. + + - Define rule for ``DIFragment``: + + - If referenced by multiple ``argObjects``, then use a + ``DW_TAG_DWARF_procedure``. + - If only referenced by a ``DIVariable`` or ``DIComposite`` field, then + use ``expr`` or ``loclist`` form that specifies the location + description expression directly. + + - Define rule for computed lifetime: + + - If referenced ``DIObject`` has no bounded lifetime segments, then use + ``expr`` form. + - If referenced ``DIObject`` has bounded lifetime segments, then use + ``loclist`` form. + +Translating To PDB (CodeView) +----------------------------- + +.. TODO:: + + Define. + +Comparison With GCC +------------------- + +.. TODO:: + + Understand how this compares to what GCC is doing? + +Example Ideas +------------- + +LDS Variables +~~~~~~~~~~~~~ + +.. TODO:: + + LDS variables, one variable but multiple kernels with distinct lifetimes, is + that possible in LLVM? + + Could allow the ``llvm.dbg.def`` intrinsic to refer to a global and use that + to define live ranges which live in functions and refer to storage outside of + the function. + + I would expect that LDS variables would have no ``!dbg.default`` and instead + have ``llvm.dbg.def`` in each function that can access it. The bounded + lifetime segment would have an expression that evaluates to the location of + the LDS variable in the specific subprogram. For a kernel it would likely be + an absolute address in the LDS address space. Each kernel may have a + different address. In functions that can be called from multiple kernels it + may be an expression that uses the LDS indirection variables to determine the + actual LDS address. + +Make Sure The Non-SSA MIR Form Works With def/kill Scheme +~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ + +.. TODO:: + + Make sure the non-SSA MIR form works with def/kill scheme, and additionally + confirm why we do not seem to need the work upstream that is trying to move + to referring to an instruction rather than a register? See `[llvm-dev] [RFC] + DebugInfo: A different way of specifying variable locations post-isel + `__. + +References +========== + +1. `[LLVMdev] [RFC] Separating Metadata from the Value hierarchy (David + Blaikie) + `__ + +2. `[LLVMdev] [RFC] Separating Metadata from the Value hierarchy + `__ + +3. `[llvm-dev] Proposal for multi location debug info support in LLVM IR `__ + +4. `[llvm-dev] Proposal for multi location debug info support in LLVM IR `__ + +5. `Multi Location Debug Info support for LLVM `__ + +6. `D81852 [DebugInfo] Update MachineInstr interface to better support variadic DBG_VALUE instructions `__ + +7. `D70601 Disallow DIExpressions with shift operators from being fragmented `__ + +8. `D57962 [DebugInfo] PR40628: Don’t salvage load operations `__ + +9. `Bug 40628 - [DebugInfo@O2] Salvaged memory loads can observe subsequent memory writes `__ + +10. :doc:`LangRef` + + 1. :ref:`wellformed` + 2. :ref:`typesystem` + 3. :ref:`globalvars` + 4. :ref:`DICompositeType` + 5. :ref:`DILocalVariable` + 6. :ref:`DIGlobalVariable` + 7. :ref:`DICompileUnit` + 8. :ref:`DISubprogram` + 9. :ref:`DILabel` + +11. :doc:`AMDGPUDwarfExtensionsForHeterogeneousDebugging` + + 1. :ref:`amdgpu-dwarf-expressions` + 2. :ref:`amdgpu-dwarf-location-list-expressions` + 3. :ref:`amdgpu-dwarf-location-description` + 4. :ref:`amdgpu-dwarf-expression-evaluation-context` + +12. :doc:`AMDGPUUsage` + + 1. :ref:`amdgpu-dwarf-dw-at-llvm-lane-pc` diff --git a/llvm/docs/AMDGPUUsage.rst b/llvm/docs/AMDGPUUsage.rst --- a/llvm/docs/AMDGPUUsage.rst +++ b/llvm/docs/AMDGPUUsage.rst @@ -26,6 +26,7 @@ AMDGPUInstructionSyntax AMDGPUInstructionNotation AMDGPUDwarfExtensionsForHeterogeneousDebugging + AMDGPULLVMExtensionsForHeterogeneousDebugging AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack/AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack Introduction @@ -1773,6 +1774,10 @@ :doc:`AMDGPUDwarfExtensionsForHeterogeneousDebugging` that are made available in DWARF Version 4 and DWARF Version 5 as an LLVM vendor extension. +AMDGPU uses LLVM features defined in +:doc:`AMDGPULLVMExtensionsForHeterogeneousDebugging` to implement the generation +of DWARF. + This section defines the AMDGPU target architecture specific DWARF mappings. .. _amdgpu-dwarf-register-identifier: diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst --- a/llvm/docs/UserGuides.rst +++ b/llvm/docs/UserGuides.rst @@ -246,6 +246,14 @@ This document describes DWARF extensions to support heterogeneous debugging for targets such as the AMDGPU backend. +:doc:`AMDGPULLVMExtensionsForHeterogeneousDebugging` + This document describes proposed LLVM Debug Information changes to support + heterogeneous debugging for targets such as the AMDGPU backend, and to + improve coverage and correctness when enabling optimizations for all + targets. This is based on concepts from + :doc:`AMDGPUDwarfExtensionsForHeterogeneousDebugging` but is not + fundamentally dependant on it. + :doc:`AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack/AMDGPUDwarfExtensionAllowLocationDescriptionOnTheDwarfExpressionStack` This document describes a DWARF extension to allow location descriptions on the DWARF expression stack. It is part of