Index: llvm/docs/InstrRefDebugInfo.md =================================================================== --- /dev/null +++ llvm/docs/InstrRefDebugInfo.md @@ -0,0 +1,173 @@ +This document explains how LLVM uses value tracking, or instruction +referencing, to determine variable locations for debug-info in the final stages +of compilation. + +# Problem statement + +At the end of compilation, LLVM must produce a DWARF location list (or similar) +describing what register or stack location a variable can be found in, for each +instruction in that variables lexical scope. We could track the virtual +register that the variable resides in through compilation, however this is +vulnerable to register optimisations during regalloc, and instruction +movements. + +# Solution: instruction referencing + +Rather than identify the virtual register that a variable value resides in, +instead in instruction referencing mode, LLVM refers to the machine instruction +and operand position that the value is defined in. Consider the LLVM-IR way of +referring to instruction values: + + %2 = add i32 %0, %1 + call void @llvm.dbg.value(metadata i32 %2, + +In LLVM-IR, the IR Value is synonymous with the instruction that compute the +value, to the extent that in memory a Value is a pointer to the computing +instruction. Instruction referencing implements this relationship in the +codegen backend of LLVM, after instruction selection. Consider the X86 assembly +below and instruction referencing debug-info, corresponding to the earlier +LLVM-IR: + + %2:gr32 = ADD32rr %0, %1, implicit-def $eflags, debug-instr-number 1 + DBG_INSTR_REF 1, 0, !123, !456, debug-location !789 + +While the function remains in SSA form, virtual register %2 is sufficient to +identify the value computed by the instruction -- however the function +eventually leaves SSA form, and register optimisations will obscure which +register the desired value is in. Instead, a more consistent way of identifying +the instructions value is to refer to the MachineOperand where the value is +defined: independently of which register is defined by that MachineOperand. In +the code above, the DBG_INSTR_REF instruction refers to instruction number +one, operand zero, while the ADD32rr has a debug-instr-number attribute +attached indicating that it is instruction number one. + +De-coupling variable locations from registers avoids difficulties involving +register allocation and optimisation, but requires additional instrumentation +when the instructions are optimised instead. Optimisations that replace +instructions with optimised versions that compute the same value must either +preserve the instruction number, or record a substitution from the old +instruction / operand number pair to the new instruction / operand pair -- see +MachineFunction::substituteDebugValuesForInst. If debug-info maintenence is not +performed, or an instruction is eliminated as dead code, the variable location +is safely dropped and marked "optimised out". The exception is instructions +that are mutated rather than replaced, which always need debug-info +maintenence. + +# Register allocator considerations + +When the register allocator runs, debugging instructions do not directly refer +to any virtual registers, and thus there is no need for expensive location +maintenence during regalloc (i.e., LiveDebugVariables). Debug instructions are +unlinked from the function, then linked back in after register allocation +completes. + +The exception is PHI instructions: these become impliict definitions at control +flow merges once regalloc finishes, and any debug numbers attached to PHI +instructions are lost. To circumvent this, debug numbers of PHIs are recorded +at the start of register allocation (phi-node-elimination), then DBG_PHI +instructions are inserted after regalloc finishes. This requires some +maintenence of which register a variable is located in during regalloc, but at +single positions (block entry points) rather than ranges of instructions. + +An example, before regalloc: + + bb.2: + %2 = PHI %1, %bb.0, %2, %bb.1, debug-instr-number 1 + +After: + + bb.2: + DBG_PHI $rax, 1 + +# LiveDebugValues + +After optimisations and code layout complete, information about variable +values must be translated into variable locations, i.e. registers and stack +slots. This is performed in the LiveDebugValues pass, where the debug +instructions and machine code is separated out into two independent functions: + * One that assigns values to variable names, + * One that assigns values to machine registers and stack slots. + +LLVMs existing SSA tools are used to place PHIs for each function, between +variable values and the values contained in machine locations, with value +propagation eliminating any un-necessary PHIs. The two can then be joined up +to map variables to values, then values to locations, for each instruction in +the function. + +Key to this process is being able to identify the movement of values between +registers and stack locations, so that the location of values can be preserved +for the full time that they are resident in the machine. + +# Required target support and transition guide + +Instruction referencing will work on any target, but likely with poor coverage. +Supporting instruction referencing well requires: + * Target hooks to be implemented to allow LiveDebugValues to follow values through the machine, + * Target-specific optimisations to be instrumented, to preserve instruction numbers. + +## Target hooks + +TargetInstrInfo::isCopyInstrImpl must be implemented to recognise any +instructions that are copy-like -- LiveDebugValues uses this to identify when +values move between registers. + +TargetInstrInfo::isLoadFromStackSlotPostFE and +TargetInstrInfo::isStoreToStackSlotPostFE are needed to identify spill and +restore instructions. Each should return the destination or source register +respectively. LiveDebugValues will track the movement of a value from / to +the stack slot. In addition, any instruction that writes to a stack spill +should have a MachineMemoryOperand attached, so that LiveDebugValues can +recognise that a slot has been clobbered. + +## Target-specific optimistaion instrumentation + +Optimisations come in two flavours: those that mutate a MachineInstr to make +it do something different, and those that create a new instruction to replace +the operation of the old. + +The former _must_ be instrumented -- the relevant question is whether any +register def in any operand will produce a different value, as a result of the +mutation. If the answer is yes, then there is a risk that a DBG_INSTR_REF +instruction referring to that operand will end up assigning the different +value to a variable, presenting the debugging developer with an unexpected +variable value. In such scenarios, call MachineInstr::dropDebugNumber() on the +mutated instruction to erase it's instruction number. Any DBG_INSTR_REF +referring to it will produce an empty variable location instead, that appears +as "optimised out" in the debugger. + +For the latter flavour of optimisation, to increase coverage you should record +an instruction number substitution: a mapping from the old instruction number / +operand pair to new instruction number / operand pair. Consider if we replace +a three-address add instruction with a two-address add: + + %2:gr32 ADD32rr %0, %1, debug-instr-number 1 + +becomes + + %2:gr32 ADD32rr %0(tied-def 0), %1, debug-instr-number 2 + +With a substitution recorded in the MachineFunction that what was instruction +number 1 operand 0, is now instruction number 2 operand 0. In LiveDebugValues, +DBG_INSTR_REFs will be mapped through the substitution table to find the most +recent instruction number / operand number of the value it refers to. + +Use MachineFunction::substituteDebugValuesForInst to automatically produce +substitutions between an old and new instruction. It assumes that any operand +that is a def in the old instruction, is a def in the new instruction at the +same operand position. This works most of the time, for example in the example +above. + +If operand numbers do not line up between the old and new instruction, use +MachineInstr::getDebugInstrNum to acquire the instruction number for the new +instruction, and MachineFunction::makeDebugValueSubstitution to record the +mapping between register defines in the old and new instructions. If some +values computed by the old instruction are no longer computed by the new +instruction, record no substitution -- LiveDebugValues will safely drop the +now unavailable variable value. + +Should your target clone instructions, much the same as the TailDuplicator +optimisation pass,, do not attempt to preserve the instruction numbers or +record any substitutions. MachineFunction::CloneMachineInstr should drop the +instruction number of any cloned instruction, to avoid duplicate numbers +appearing to LiveDebugValues. Dealing with duplicated instructions is a +natural extension to instruction referencing that's currently unimplemented.