diff --git a/llvm/docs/HowToUpdateDebugInfo.rst b/llvm/docs/HowToUpdateDebugInfo.rst new file mode 100644 --- /dev/null +++ b/llvm/docs/HowToUpdateDebugInfo.rst @@ -0,0 +1,351 @@ +======================================================= +How to Update Debug Info: A Guide for LLVM Pass Authors +======================================================= + +.. contents:: + :local: + +Introduction +============ + +Certain kinds of code transformations can inadvertently result in a loss of +debug info, or worse, make debug info misrepresent the state of a program. + +This document specifies how to correctly update debug info in various kinds of +code transformations, and offers suggestions for how to create targeted debug +info tests for arbitrary transformations. + +For more on the philosophy behind LLVM debugging information, see +:doc:`SourceLevelDebugging`. + +IR-level transformations +======================== + +Deleting an Instruction +----------------------- + +When an ``Instruction`` is deleted, its debug uses change to ``undef``. This is +a loss of debug info: the value of a one or more source variables becomes +unavailable, starting with the ``llvm.dbg.value(undef, ...)``. When there is no +way to reconstitute the value of the lost instruction, this is the best +possible outcome. However, it's often possible to do better: + +* If the dying instruction can be RAUW'd, do so. The + ``Value::replaceAllUsesWith`` API transparently updates debug uses of the + dying instruction to point to the replacement value. + +* If the dying instruction cannot be RAUW'd, call + ``llvm::salvageDebugInfoOrMarkUndef`` on it. This makes a best-effort attempt + to rewrite debug uses of the dying instruction by describing its effect as a + ``DIExpression``. + +* If one of the **operands** of a dying instruction would become trivially + dead, use ``llvm::replaceAllDbgUsesWith`` to rewrite the debug uses of that + operand. Consider the following example function: + +.. code-block:: llvm + + define i16 @foo(i16 %a) { + %b = sext i16 %a to i32 + %c = and i32 %b, 15 + call void @llvm.dbg.value(metadata i32 %c, ...) + %d = trunc i32 %c to i16 + ret i16 %d + } + +Now, here's what happens after the unnecessary truncation instruction ``%d`` is +replaced with a simplified instruction: + +.. code-block:: llvm + + define i16 @foo(i16 %a) { + call void @llvm.dbg.value(metadata i32 undef, ...) + %simplified = and i16 %a, 15 + ret i16 %simplified + } + +Note that after deleting ``%d``, all uses of its operand ``%c`` become +trivially dead. The debug use which used to point to ``%c`` is now ``undef``, +and debug info is needlessly lost. + +To solve this problem, do: + +.. code-block:: cpp + + llvm::replaceAllDbgUsesWith(%c, theSimplifiedAndInstruction, ...) + +This results in better debug info because the debug use of ``%c`` is preserved: + +.. code-block:: llvm + + define i16 @foo(i16 %a) { + %simplified = and i16 %a, 15 + call void @llvm.dbg.value(metadata i16 %simplified, ...) + ret i16 %simplified + } + +You may have noticed that ``%simplified`` is narrower than ``%c``: this is not +a problem, because ``llvm::replaceAllDbgUsesWith`` takes care of inserting the +necessary conversion operations into the DIExpressions of updated debug uses. + +Hoisting an Instruction +----------------------- + +TODO + +Sinking an Instruction +---------------------- + +TODO + +Cloning an Instruction +---------------------- + +TODO + +Merging two Instructions +------------------------ + +TODO + +Creating an artificial Instruction +---------------------------------- + +TODO + +Mutation testing for IR-level transformations +--------------------------------------------- + +An IR test case for a transformation can, in many cases, be automatically +mutated to test debug info handling within that transformation. This is a +simple way to test for proper debug info handling. + +The ``debugify`` utility +^^^^^^^^^^^^^^^^^^^^^^^^ + +The ``debugify`` testing utility is just a pair of passes: ``debugify`` and +``check-debugify``. + +The first applies synthetic debug information to every instruction of the +module, and the second checks that this DI is still available after an +optimization has occurred, reporting any errors/warnings while doing so. + +The instructions are assigned sequentially increasing line locations, and are +immediately used by debug value intrinsics everywhere possible. + +For example, here is a module before: + +.. code-block:: llvm + + define void @f(i32* %x) { + entry: + %x.addr = alloca i32*, align 8 + store i32* %x, i32** %x.addr, align 8 + %0 = load i32*, i32** %x.addr, align 8 + store i32 10, i32* %0, align 4 + ret void + } + +and after running ``opt -debugify``: + +.. code-block:: llvm + + define void @f(i32* %x) !dbg !6 { + entry: + %x.addr = alloca i32*, align 8, !dbg !12 + call void @llvm.dbg.value(metadata i32** %x.addr, metadata !9, metadata !DIExpression()), !dbg !12 + store i32* %x, i32** %x.addr, align 8, !dbg !13 + %0 = load i32*, i32** %x.addr, align 8, !dbg !14 + call void @llvm.dbg.value(metadata i32* %0, metadata !11, metadata !DIExpression()), !dbg !14 + store i32 10, i32* %0, align 4, !dbg !15 + ret void, !dbg !16 + } + + !llvm.dbg.cu = !{!0} + !llvm.debugify = !{!3, !4} + !llvm.module.flags = !{!5} + + !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2) + !1 = !DIFile(filename: "debugify-sample.ll", directory: "/") + !2 = !{} + !3 = !{i32 5} + !4 = !{i32 2} + !5 = !{i32 2, !"Debug Info Version", i32 3} + !6 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !1, line: 1, type: !7, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, retainedNodes: !8) + !7 = !DISubroutineType(types: !2) + !8 = !{!9, !11} + !9 = !DILocalVariable(name: "1", scope: !6, file: !1, line: 1, type: !10) + !10 = !DIBasicType(name: "ty64", size: 64, encoding: DW_ATE_unsigned) + !11 = !DILocalVariable(name: "2", scope: !6, file: !1, line: 3, type: !10) + !12 = !DILocation(line: 1, column: 1, scope: !6) + !13 = !DILocation(line: 2, column: 1, scope: !6) + !14 = !DILocation(line: 3, column: 1, scope: !6) + !15 = !DILocation(line: 4, column: 1, scope: !6) + !16 = !DILocation(line: 5, column: 1, scope: !6) + +Using ``debugify`` +^^^^^^^^^^^^^^^^^^ + +A simple way to use ``debugify`` is as follows: + +.. code-block:: bash + + $ opt -debugify -pass-to-test -check-debugify sample.ll + +This will inject synthetic DI to ``sample.ll`` run the ``pass-to-test`` and +then check for missing DI. The ``-check-debugify`` step can of course be +omitted in favor of more customizable FileCheck directives. + +Some other ways to run debugify are available: + +.. code-block:: bash + + # Same as the above example. + $ opt -enable-debugify -pass-to-test sample.ll + + # Suppresses verbose debugify output. + $ opt -enable-debugify -debugify-quiet -pass-to-test sample.ll + + # Prepend -debugify before and append -check-debugify -strip after + # each pass on the pipeline (similar to -verify-each). + $ opt -debugify-each -O2 sample.ll + +In order for ``check-debugify`` to work, the DI must be coming from +``debugify``. Thus, modules with existing DI will be skipped. + +``debugify`` can be used to test a backend, e.g: + +.. code-block:: bash + + $ opt -debugify < sample.ll | llc -o - + +There is also a MIR-level debugify pass that can be run before each backend +pass, see: +:ref:`Mutation testing for MIR-level transformations`. + +``debugify`` in regression tests +^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ + +The output of the ``debugify`` pass must be stable enough to use in regression +tests. Changes to this pass are not allowed to break existing tests. + +.. note:: + + Regression tests must be robust. Avoid hardcoding line/variable numbers in + check lines. In cases where this can't be avoided (say, if a test wouldn't + be precise enough), moving the test to its own file is preferred. + +MIR-level transformations +========================= + +Deleting a MachineInstr +----------------------- + +TODO + +Hoisting a MachineInstr +----------------------- + +TODO + +Sinking a MachineInstr +---------------------- + +TODO + +Cloning a MachineInstr +---------------------- + +TODO + +Creating an artificial MachineInstr +----------------------------------- + +TODO + +Mutation testing for MIR-level transformations +---------------------------------------------- + +A varaint of the ``debugify`` utility described in :ref:`Mutation testing for +IR-level transformations` can be used for MIR-level transformations as well: +much like the IR-level pass, ``mir-debugify`` inserts sequentially increasing +line locations to each ``MachineInstr`` in a ``Module`` (although there is no +equivalent MIR-level ``check-debugify`` pass). + +For example, here is a snippet before: + +.. code-block:: llvm + + name: test + body: | + bb.1 (%ir-block.0): + %0:_(s32) = IMPLICIT_DEF + %1:_(s32) = IMPLICIT_DEF + %2:_(s32) = G_CONSTANT i32 2 + %3:_(s32) = G_ADD %0, %2 + %4:_(s32) = G_SUB %3, %1 + +and after running ``llc -run-pass=mir-debugify``: + +.. code-block:: llvm + + name: test + body: | + bb.0 (%ir-block.0): + %0:_(s32) = IMPLICIT_DEF debug-location !12 + DBG_VALUE %0(s32), $noreg, !9, !DIExpression(), debug-location !12 + %1:_(s32) = IMPLICIT_DEF debug-location !13 + DBG_VALUE %1(s32), $noreg, !11, !DIExpression(), debug-location !13 + %2:_(s32) = G_CONSTANT i32 2, debug-location !14 + DBG_VALUE %2(s32), $noreg, !9, !DIExpression(), debug-location !14 + %3:_(s32) = G_ADD %0, %2, debug-location !DILocation(line: 4, column: 1, scope: !6) + DBG_VALUE %3(s32), $noreg, !9, !DIExpression(), debug-location !DILocation(line: 4, column: 1, scope: !6) + %4:_(s32) = G_SUB %3, %1, debug-location !DILocation(line: 5, column: 1, scope: !6) + DBG_VALUE %4(s32), $noreg, !9, !DIExpression(), debug-location !DILocation(line: 5, column: 1, scope: !6) + +By default, ``mir-debugify`` inserts ``DBG_VALUE`` instructions **everywhere** +it is legal to do so. In particular, every (non-PHI) machine instruction that +defines a register should be followed by a ``DBG_VALUE`` use of that def. If +an instruction does not define a register, but can be followed by a debug inst, +MIRDebugify inserts a ``DBG_VALUE`` that references a constant. Insertion of +``DBG_VALUE``'s can be disabled by setting ``-debugify-level=locations``. + +To run MIRDebugify once, simply insert ``mir-debugify`` into your ``llc`` +invocation, like: + +.. code-block:: bash + + # Before some other pass. + $ llc -run-pass=mir-debugify,other-pass ... + + # After some other pass. + $ llc -run-pass=other-pass,mir-debugify ... + +To run MIRDebugify before each pass in a pipeline, use +``-debugify-and-strip-all-safe``. This can be combined with ``-start-before`` +and ``-start-after``. For example: + +.. code-block:: bash + + $ llc -debugify-and-strip-all-safe -run-pass=... + $ llc -debugify-and-strip-all-safe -O1 + +To strip out all debug info from a test, use ``mir-strip-debug``, like: + +.. code-block:: bash + + $ llc -run-pass=mir-debugify,other-pass,mir-strip-debug + +It can be useful to combine ``mir-debugify`` and ``mir-strip-debug`` to +identify backend transformations which break in the presence of debug info. +For example, to run the AArch64 backend tests with all normal passes +"sandwiched" in between MIRDebugify and MIRStripDebugify mutation passes, run: + +.. code-block:: bash + + $ llvm-lit test/CodeGen/AArch64 -Dllc="llc -debugify-and-strip-all-safe" + +Using the LostDebugLocObserver +------------------------------ + +TODO diff --git a/llvm/docs/SourceLevelDebugging.rst b/llvm/docs/SourceLevelDebugging.rst --- a/llvm/docs/SourceLevelDebugging.rst +++ b/llvm/docs/SourceLevelDebugging.rst @@ -86,11 +86,12 @@ * LLVM debug information **always provides information to accurately read the source-level state of the program**, regardless of which LLVM - optimizations have been run, and without any modification to the - optimizations themselves. However, some optimizations may impact the - ability to modify the current state of the program with a debugger, such - as setting program variables, or calling functions that have been - deleted. + optimizations have been run. :doc:`HowToUpdateDebugInfo` specifies how debug + info should be updated in various kinds of code transformations to avoid + breaking this guarantee, and how to preserve as much useful debug info as + possible. Note that some optimizations may impact the ability to modify the + current state of the program with a debugger, such as setting program + variables, or calling functions that have been deleted. * As desired, LLVM optimizations can be upgraded to be aware of debugging information, allowing them to update the debugging information as they @@ -1993,180 +1994,3 @@ records, constructing a C++ test case that makes MSVC emit those records, dumping the records, understanding them, and then generating equivalent records in LLVM's backend. - -Testing Debug Info Preservation in Optimizations -================================================ - -The following paragraphs are an introduction to the debugify utility -and examples of how to use it in regression tests to check debug info -preservation after optimizations. - -The ``debugify`` utility ------------------------- - -The ``debugify`` synthetic debug info testing utility consists of two -main parts. The ``debugify`` pass and the ``check-debugify`` one. They are -meant to be used with ``opt`` for development purposes. - -The first applies synthetic debug information to every instruction of the module, -while the latter checks that this DI is still available after an optimization -has occurred, reporting any errors/warnings while doing so. - -The instructions are assigned sequentially increasing line locations, -and are immediately used by debug value intrinsics when possible. - -For example, here is a module before: - -.. code-block:: llvm - - define void @f(i32* %x) { - entry: - %x.addr = alloca i32*, align 8 - store i32* %x, i32** %x.addr, align 8 - %0 = load i32*, i32** %x.addr, align 8 - store i32 10, i32* %0, align 4 - ret void - } - -and after running ``opt -debugify`` on it we get: - -.. code-block:: text - - define void @f(i32* %x) !dbg !6 { - entry: - %x.addr = alloca i32*, align 8, !dbg !12 - call void @llvm.dbg.value(metadata i32** %x.addr, metadata !9, metadata !DIExpression()), !dbg !12 - store i32* %x, i32** %x.addr, align 8, !dbg !13 - %0 = load i32*, i32** %x.addr, align 8, !dbg !14 - call void @llvm.dbg.value(metadata i32* %0, metadata !11, metadata !DIExpression()), !dbg !14 - store i32 10, i32* %0, align 4, !dbg !15 - ret void, !dbg !16 - } - - !llvm.dbg.cu = !{!0} - !llvm.debugify = !{!3, !4} - !llvm.module.flags = !{!5} - - !0 = distinct !DICompileUnit(language: DW_LANG_C, file: !1, producer: "debugify", isOptimized: true, runtimeVersion: 0, emissionKind: FullDebug, enums: !2) - !1 = !DIFile(filename: "debugify-sample.ll", directory: "/") - !2 = !{} - !3 = !{i32 5} - !4 = !{i32 2} - !5 = !{i32 2, !"Debug Info Version", i32 3} - !6 = distinct !DISubprogram(name: "f", linkageName: "f", scope: null, file: !1, line: 1, type: !7, isLocal: false, isDefinition: true, scopeLine: 1, isOptimized: true, unit: !0, retainedNodes: !8) - !7 = !DISubroutineType(types: !2) - !8 = !{!9, !11} - !9 = !DILocalVariable(name: "1", scope: !6, file: !1, line: 1, type: !10) - !10 = !DIBasicType(name: "ty64", size: 64, encoding: DW_ATE_unsigned) - !11 = !DILocalVariable(name: "2", scope: !6, file: !1, line: 3, type: !10) - !12 = !DILocation(line: 1, column: 1, scope: !6) - !13 = !DILocation(line: 2, column: 1, scope: !6) - !14 = !DILocation(line: 3, column: 1, scope: !6) - !15 = !DILocation(line: 4, column: 1, scope: !6) - !16 = !DILocation(line: 5, column: 1, scope: !6) - -The following is an example of the -check-debugify output: - -.. code-block:: none - - $ opt -enable-debugify -loop-vectorize llvm/test/Transforms/LoopVectorize/i8-induction.ll -disable-output - ERROR: Instruction with empty DebugLoc in function f -- %index = phi i32 [ 0, %vector.ph ], [ %index.next, %vector.body ] - -Errors/warnings can range from instructions with empty debug location to an -instruction having a type that's incompatible with the source variable it describes, -all the way to missing lines and missing debug value intrinsics. - -Fixing errors -^^^^^^^^^^^^^ - -Each of the errors above has a relevant API available to fix it. - -* In the case of missing debug location, ``Instruction::setDebugLoc`` or possibly - ``IRBuilder::setCurrentDebugLocation`` when using a Builder and the new location - should be reused. - -* When a debug value has incompatible type ``llvm::replaceAllDbgUsesWith`` can be used. - After a RAUW call an incompatible type error can occur because RAUW does not handle - widening and narrowing of variables while ``llvm::replaceAllDbgUsesWith`` does. It is - also capable of changing the DWARF expression used by the debugger to describe the variable. - It also prevents use-before-def by salvaging or deleting invalid debug values. - -* When a debug value is missing ``llvm::salvageDebugInfo`` can be used when no replacement - exists, or ``llvm::replaceAllDbgUsesWith`` when a replacement exists. - -Using ``debugify`` ------------------- - -In order for ``check-debugify`` to work, the DI must be coming from -``debugify``. Thus, modules with existing DI will be skipped. - -The most straightforward way to use ``debugify`` is as follows:: - - $ opt -debugify -pass-to-test -check-debugify sample.ll - -This will inject synthetic DI to ``sample.ll`` run the ``pass-to-test`` -and then check for missing DI. - -Some other ways to run debugify are available: - -.. code-block:: bash - - # Same as the above example. - $ opt -enable-debugify -pass-to-test sample.ll - - # Suppresses verbose debugify output. - $ opt -enable-debugify -debugify-quiet -pass-to-test sample.ll - - # Prepend -debugify before and append -check-debugify -strip after - # each pass on the pipeline (similar to -verify-each). - $ opt -debugify-each -O2 sample.ll - -``debugify`` can also be used to test a backend, e.g: - -.. code-block:: bash - - $ opt -debugify < sample.ll | llc -o - - -``debugify`` in regression tests -^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ - -The ``-debugify`` pass is especially helpful when it comes to testing that -a given pass preserves DI while transforming the module. For this to work, -the ``-debugify`` output must be stable enough to use in regression tests. -Changes to this pass are not allowed to break existing tests. - -It allows us to test for DI loss in the same tests we check that the -transformation is actually doing what it should. - -Here is an example from ``test/Transforms/InstCombine/cast-mul-select.ll``: - -.. code-block:: llvm - - ; RUN: opt < %s -debugify -instcombine -S | FileCheck %s --check-prefix=DEBUGINFO - - define i32 @mul(i32 %x, i32 %y) { - ; DBGINFO-LABEL: @mul( - ; DBGINFO-NEXT: [[C:%.*]] = mul i32 {{.*}} - ; DBGINFO-NEXT: call void @llvm.dbg.value(metadata i32 [[C]] - ; DBGINFO-NEXT: [[D:%.*]] = and i32 {{.*}} - ; DBGINFO-NEXT: call void @llvm.dbg.value(metadata i32 [[D]] - - %A = trunc i32 %x to i8 - %B = trunc i32 %y to i8 - %C = mul i8 %A, %B - %D = zext i8 %C to i32 - ret i32 %D - } - -Here we test that the two ``dbg.value`` instrinsics are preserved and -are correctly pointing to the ``[[C]]`` and ``[[D]]`` variables. - -.. note:: - - Note, that when writing this kind of regression tests, it is important - to make them as robust as possible. That's why we should try to avoid - hardcoding line/variable numbers in check lines. If for example you test - for a ``DILocation`` to have a specific line number, and someone later adds - an instruction before the one we check the test will fail. In the cases this - can't be avoided (say, if a test wouldn't be precise enough), moving the - test to its own file is preferred. diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst --- a/llvm/docs/UserGuides.rst +++ b/llvm/docs/UserGuides.rst @@ -35,6 +35,7 @@ HowToBuildWithPGO HowToCrossCompileBuiltinsOnArm HowToCrossCompileLLVM + HowToUpdateDebugInfo LinkTimeOptimization LoopTerminology MarkdownQuickstartTemplate @@ -196,4 +197,4 @@ :doc:`AMDGPUDwarfProposalForHeterogeneousDebugging` This document describes a DWARF proposal to support heterogeneous debugging - for targets such as the AMDGPU backend. \ No newline at end of file + for targets such as the AMDGPU backend.