diff --git a/llvm/docs/InstrRefDebugInfo.md b/llvm/docs/InstrRefDebugInfo.md new file mode 100644 --- /dev/null +++ b/llvm/docs/InstrRefDebugInfo.md @@ -0,0 +1,180 @@ +# Instruction referencing for debug info + +This document explains how LLVM uses value tracking, or instruction +referencing, to determine variable locations for debug info in the code +generation stage of compilation. This content is aimed at those working on code +generation targets and optimisation passes. It may also be of interest to anyone +curious about low-level debug info handling. + +# Problem statement + +At the end of compilation, LLVM must produce a DWARF location list (or similar) +describing what register or stack location a variable can be found in, for each +instruction in that variable's lexical scope. We could track the virtual +register that the variable resides in through compilation, however this is +vulnerable to register optimisations during regalloc, and instruction +movements. + +# Solution: instruction referencing + +Rather than identify the virtual register that a variable value resides in, +instead in instruction referencing mode, LLVM refers to the machine instruction +and operand position that the value is defined in. Consider the LLVM IR way of +referring to instruction values: + + %2 = add i32 %0, %1 + call void @llvm.dbg.value(metadata i32 %2, + +In LLVM IR, the IR Value is synonymous with the instruction that computes the +value, to the extent that in memory a Value is a pointer to the computing +instruction. Instruction referencing implements this relationship in the +codegen backend of LLVM, after instruction selection. Consider the X86 assembly +below and instruction referencing debug info, corresponding to the earlier +LLVM IR: + + %2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1 + DBG_INSTR_REF 1, 0, !123, !456, debug-location !789 + +While the function remains in SSA form, virtual register %2 is sufficient to +identify the value computed by the instruction -- however the function +eventually leaves SSA form, and register optimisations will obscure which +register the desired value is in. Instead, a more consistent way of identifying +the instruction's value is to refer to the MachineOperand where the value is +defined: independently of which register is defined by that MachineOperand. In +the code above, the DBG_INSTR_REF instruction refers to instruction number one, +operand zero, while the ADD32rr has a debug-instr-number attribute attached +indicating that it is instruction number one. + +De-coupling variable locations from registers avoids difficulties involving +register allocation and optimisation, but requires additional instrumentation +when the instructions are optimised instead. Optimisations that replace +instructions with optimised versions that compute the same value must either +preserve the instruction number, or record a substitution from the old +instruction / operand number pair to the new instruction / operand pair -- see +MachineFunction::substituteDebugValuesForInst. If debug info maintenance is not +performed, or an instruction is eliminated as dead code, the variable location +is safely dropped and marked "optimised out". The exception is instructions +that are mutated rather than replaced, which always need debug info +maintenance. + +# Register allocator considerations + +When the register allocator runs, debugging instructions do not directly refer +to any virtual registers, and thus there is no need for expensive location +maintenance during regalloc (i.e., LiveDebugVariables). Debug instructions are +unlinked from the function, then linked back in after register allocation +completes. + +The exception is PHI instructions: these become implicit definitions at control +flow merges once regalloc finishes, and any debug numbers attached to PHI +instructions are lost. To circumvent this, debug numbers of PHIs are recorded +at the start of register allocation (phi-node-elimination), then DBG_PHI +instructions are inserted after regalloc finishes. This requires some +maintenance of which register a variable is located in during regalloc, but at +single positions (block entry points) rather than ranges of instructions. + +An example, before regalloc: + + bb.2: + %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1 + +After: + + bb.2: + DBG_PHI $rax, 1 + +# LiveDebugValues + +After optimisations and code layout complete, information about variable +values must be translated into variable locations, i.e. registers and stack +slots. This is performed in the [LiveDebugValues pass][LiveDebugValues], where +the debug instructions and machine code are separated out into two independent +functions: + * One that assigns values to variable names, + * One that assigns values to machine registers and stack slots. + +LLVM's existing SSA tools are used to place PHIs for each function, between +variable values and the values contained in machine locations, with value +propagation eliminating any un-necessary PHIs. The two can then be joined up +to map variables to values, then values to locations, for each instruction in +the function. + +Key to this process is being able to identify the movement of values between +registers and stack locations, so that the location of values can be preserved +for the full time that they are resident in the machine. + +# Required target support and transition guide + +Instruction referencing will work on any target, but likely with poor coverage. +Supporting instruction referencing well requires: + * Target hooks to be implemented to allow LiveDebugValues to follow values through the machine, + * Target-specific optimisations to be instrumented, to preserve instruction numbers. + +## Target hooks + +TargetInstrInfo::isCopyInstrImpl must be implemented to recognise any +instructions that are copy-like -- LiveDebugValues uses this to identify when +values move between registers. + +TargetInstrInfo::isLoadFromStackSlotPostFE and +TargetInstrInfo::isStoreToStackSlotPostFE are needed to identify spill and +restore instructions. Each should return the destination or source register +respectively. LiveDebugValues will track the movement of a value from / to +the stack slot. In addition, any instruction that writes to a stack spill +should have a MachineMemoryOperand attached, so that LiveDebugValues can +recognise that a slot has been clobbered. + +## Target-specific optimisation instrumentation + +Optimisations come in two flavours: those that mutate a MachineInstr to make +it do something different, and those that create a new instruction to replace +the operation of the old. + +The former _must_ be instrumented -- the relevant question is whether any +register def in any operand will produce a different value, as a result of the +mutation. If the answer is yes, then there is a risk that a DBG_INSTR_REF +instruction referring to that operand will end up assigning the different +value to a variable, presenting the debugging developer with an unexpected +variable value. In such scenarios, call MachineInstr::dropDebugNumber() on the +mutated instruction to erase its instruction number. Any DBG_INSTR_REF +referring to it will produce an empty variable location instead, that appears +as "optimised out" in the debugger. + +For the latter flavour of optimisation, to increase coverage you should record +an instruction number substitution: a mapping from the old instruction number / +operand pair to new instruction number / operand pair. Consider if we replace +a three-address add instruction with a two-address add: + + %2:gr32 = ADD32rr %0, %1, debug-instr-number 1 + +becomes + + %2:gr32 = ADD32rr %0(tied-def 0), %1, debug-instr-number 2 + +With a substitution from "instruction number 1 operand 0" to "instruction number +2 operand 0" recorded in the MachineFunction. In LiveDebugValues, DBG_INSTR_REFs +will be mapped through the substitution table to find the most recent +instruction number / operand number of the value it refers to. + +Use MachineFunction::substituteDebugValuesForInst to automatically produce +substitutions between an old and new instruction. It assumes that any operand +that is a def in the old instruction is a def in the new instruction at the +same operand position. This works most of the time, for example in the example +above. + +If operand numbers do not line up between the old and new instruction, use +MachineInstr::getDebugInstrNum to acquire the instruction number for the new +instruction, and MachineFunction::makeDebugValueSubstitution to record the +mapping between register definitions in the old and new instructions. If some +values computed by the old instruction are no longer computed by the new +instruction, record no substitution -- LiveDebugValues will safely drop the +now unavailable variable value. + +Should your target clone instructions, much the same as the TailDuplicator +optimisation pass, do not attempt to preserve the instruction numbers or +record any substitutions. MachineFunction::CloneMachineInstr should drop the +instruction number of any cloned instruction, to avoid duplicate numbers +appearing to LiveDebugValues. Dealing with duplicated instructions is a +natural extension to instruction referencing that's currently unimplemented. + +[LiveDebugValues]: SourceLevelDebugging.html#livedebugvalues-expansion-of-variable-locations diff --git a/llvm/docs/MIRLangRef.rst b/llvm/docs/MIRLangRef.rst --- a/llvm/docs/MIRLangRef.rst +++ b/llvm/docs/MIRLangRef.rst @@ -911,5 +911,6 @@ and the ``DBG_INSTR_REF`` s position records where the variable takes on the designated value in the same way. -More information about how these constructs are used will appear on the source -level debugging page in due course, see also :doc:`SourceLevelDebugging` and :doc:`HowToUpdateDebugInfo`. +More information about how these constructs are used is available in +:doc:`InstrRefDebugInfo`. The related documents :doc:`SourceLevelDebugging` and +:doc:`HowToUpdateDebugInfo` may be useful as well. diff --git a/llvm/docs/UserGuides.rst b/llvm/docs/UserGuides.rst --- a/llvm/docs/UserGuides.rst +++ b/llvm/docs/UserGuides.rst @@ -38,6 +38,7 @@ HowToCrossCompileBuiltinsOnArm HowToCrossCompileLLVM HowToUpdateDebugInfo + InstrRefDebugInfo LinkTimeOptimization LoopTerminology MarkdownQuickstartTemplate @@ -160,6 +161,15 @@ This document describes the design and philosophy behind the LLVM source-level debugger. +:doc:`How to Update Debug Info ` + This document specifies how to correctly update debug info in various kinds + of code transformations. + +:doc:`InstrRefDebugInfo` + This document explains how LLVM uses value tracking, or instruction + referencing, to determine variable locations for debug info in the final + stages of compilation. + Code Generation ---------------