Index: llvm/docs/AssignmentTracking.md =================================================================== --- /dev/null +++ llvm/docs/AssignmentTracking.md @@ -0,0 +1,190 @@ +# Assignment Tracking + +Assignment Tracking is an alternative technique for tracking variable locations +through optimisations in llvm. It provides accurate variable locations for +assignments where a local variable (or a field of one) is the LHS. Indirect +assignments that get optimized out may not be visible, but the locations should +otherwise be reasonable. + +The core idea is to track more information about source assignments in order +and preserve enough information to be able to defer decisions about whether to +use non-memory locations (register, constant) or memory locations until after +middle end optimisations have run. This is in opposition to what we have now, +which is to make the decision for most variables early on, which can result in +suboptimal variable locations that may be either incorrect or incomplete. + +A secondary goal of assignment tracking is to cause minimal additional work for +llvm pass writers, and minimal disruption to llvm in general. + +## Status and usage + +**Status**: Experimental work in progress. Enabling is strongly advised against +except for development and testing. + +**Enable in clang**: -Xclang -fexperimental-assignment-tracking + +**Enable in llvm tools**: -experimental-assignment-tracking + +## Design and implementation + +### Assignment markers: llvm.dbg.assign + +`llvm.dbg.value`, a conventional debug intrinsic, marks out a position in the +IR where a variable takes a particular value. Similarly, Assignment Tracking +marks out the position of assignments with a new intrinsic called +`llvm.dbg.assign`. + +In order to know where in IR it is appropriate to use a memory location for a +variable, each assignment marker must in some way refer to the store, if any +(or multiple!), that performs the assignment. That way, the position of the +store and marker can be considered together when making that choice. Another +important benefit of referring to the store is that we can then build a two-way +mapping of stores<->markers that can be used to find markers that need to be +updated when stores are modified. + +An `llvm.dbg.assign` marker that is not linked to any instruction signals that +the store that performed the assignment has been optimised out, and therefore +the memory location will not be valid for at least some part of the program. + +This is the actual signature of an `llvm.dbg.assign` in IR: +``` +void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata) +``` + +That isn't particularly illuminating, so here's the signature in practice. Each +parameter is wrapped in `MetadataAsValue`, and `Value *` type parameters are +first wrapped in `ValueAsMetadata`: + +``` +void @llvm.dbg.assign(Value *Value, + DIExpression *ValueExpression, + DILocalVariable *Variable, + DIAssignID *ID, + Value *Address, + DIExpression *AddressExpression) +``` + +The first three parameters look and behave like an `llvm.dbg.value`. `ID` is a +reference to a store (see next section). `Address` is the destination address +of the store and it is modified by `AddressExpression`. llvm currently encodes +variable fragment information in `DIExpression`s, so as an implementation quirk +the `FragmentInfo` for `Variable` is contained within `ValueExpression` only. + +### Instruction link: DIAssignID + +`DIAssignID` metadata is the mechanism that is currently used to encode the +store<->marker link. It has no operands and all instances are `distinct`; +equality is checked for by comparing addresses. + +`llvm.dbg.assign` intrinsics `use` a `DIAssignID` metadata instance as an +operand. This way it "refers to" any store-like instruction that has the same +`DIAssignID` attachment. E.g. For this test.cpp, + +``` +int fun(int a) { + return a; +} +``` +compiled without optimisations: +``` +$ clang++ test.cpp -o test.ll -emit-llvm -S -g -O0 -Xclang -fexperimental-assignment-tracking +``` +we get: +``` +define dso_local noundef i32 @_Z3funi(i32 noundef %a) #0 !dbg !8 { +entry: + %a.addr = alloca i32, align 4, !DIAssignID !13 + call void @llvm.dbg.assign(metadata i1 undef, metadata !14, metadata !DIExpression(), metadata !13, metadata i32* %a.addr, metadata !DIExpression()), !dbg !15 + store i32 %a, i32* %a.addr, align 4, !DIAssignID !16 + call void @llvm.dbg.assign(metadata i32 %a, metadata !14, metadata !DIExpression(), metadata !16, metadata i32* %a.addr, metadata !DIExpression()), !dbg !15 + %0 = load i32, i32* %a.addr, align 4, !dbg !17 + ret i32 %0, !dbg !18 +} + +... +!13 = distinct !DIAssignID() +!14 = !DILocalVariable(name: "a", ...) +... +!16 = distinct !DIAssignID() +``` + +The first `llvm.dbg.asssign` refers to the `alloca` through `!DIAssignID !13`, +and the second refers to the `store` through `!DIAssignID !16`. + +### Store-like instructions + +In the absence of a linked `llvm.dbg.assign`, a store to an address that is +known to be the backing storage for a variable is considered to represent an +assignment to that variable. + +This gives us a safe fall-back in cases where `llvm.dbg.assign` intrinsics have +been deleted, the `DIAssignID` attachment on the store has been dropped, or the +optimiser has made a once-indirect store (not tracked with Assignment Tracking) +direct. + +### Middle-end: Considerations for pass-writers + +Considerations for pass writers and maintainers +**cloning** an instruction: nothing new to do. Cloning automatically clones a +`DIAssignID` attachment. Multiple instructions may have the same `DIAssignID` +instruction. In this case, the assignment is considered to take place in +multiple positions in the program. + +**moving** a non-debug instruction: nothing new to do. Instructions linked to an +`llvm.dbg.assign` have their initial IR position marked by the position of the +`llvm.dbg.assign`. + +**moving** a debug intrinsic: avoid moving `llvm.dbg.assign` intrinsics where +possible, as they represent a source-level assignment, whose position in the +program should not be affected by optimization passes. + +**deleting** a non-debug instruction: nothing new to do. Simple DSE does not +require any change; it’s safe to delete an instruction with a `DIAssignID` +attachment. An `llvm.dbg.assign` that uses a `DIAssignID` that is not attached +to any instruction indicates that the memory location isn’t valid. + +**deleting** a debug intrinsic: Nothing new to do. Just like for conventional +debug intrinsics, unless it is unreachable, it’s almost always incorrect to +delete a `llvm.dbg.assign` intrinsic. + +**merging** stores: In many cases no change is required as `DIAssignID` +attachments are automatically merged if `combineMetadata` is called. One way or +another, the `DIAssignID` attachments must be merged such that new store +becomes linked to all the `llvm.dbg.assign` intrinsics that the merged stores +were linked to. This can be achieved simply by calling a helper function +`Instruction::mergeDIAssignID`. + +**inlining** stores: As stores are inlined we generate `llvm.dbg.assign` +intrinsics and `DIAssignID` attachments as if the stores represent source +assignments, just like the in frontend. This isn’t perfect, as stores may have +been moved, modified or deleted before inlining, but it does at least keep the +information about the variable correct within the non-inlined scope. + +**splitting** stores: SROA and passes that split stores treat `llvm.dbg.assign` +intrinsics similarly to `llvm.dbg.declare` intrinsics. Clone the +`llvm.dbg.assign` intrinsics linked to the store, update the FragmentInfo in +the `ValueExpression`, and give the split stores (and cloned intrinsics) new +`DIAssignID` attachments each. In other words, treat the split stores as +separate assignments. For partial DSE (e.g. shortening a memset), we do the +same except that `llvm.dbg.assign` for the dead fragment gets an `Undef` +`Address`. + +**promoting** allocas and store/loads: `llvm.dbg.assign` intrinsics implicitly +describe joined values in memory locations at CFG joins but this is not +necessarily the case after promoting (or partially promoting) the +variable. Passes that promote variables are responsible for inserting +`llvm.dbg.assign` intrinsics after the resultant PHIs generated during +promotion. mem2reg already has to do this (with `llvm.dbg.value`) for +`llvm.dbg.declare`s. Where a store has no linked intrinsic, the store is +assumed to represent an assignment for variables stored at the destination +address. + +### Lowering llvm.dbg.assign to MIR + +To begin with only SelectionDAG ISel will be supported. `llvm.dbg.assign` +intrinsics are lowered to MIR `DBG_INSTR_REF` instructions. Before this happens +we need to decide where it is appropriate to use memory locations and where we +must use a non-memory location (or no location) for each variable. In order to +make those decisions we run a standard fixed-point dataflow analysis that makes +the choice at each instruction, iteratively joining the results for each block. + Index: llvm/docs/SourceLevelDebugging.rst =================================================================== --- llvm/docs/SourceLevelDebugging.rst +++ llvm/docs/SourceLevelDebugging.rst @@ -251,6 +251,24 @@ be indirect (i.e, a pointer to the source variable), provided that interpreting the complex expression derives the direct value. +``llvm.dbg.assign`` +^^^^^^^^^^^^^^^^^^ + +.. code-block:: llvm + + void @llvm.dbg.assign(metadata, metadata, metadata, metadata, metadata, metadata) + +This intrinsic marks the position in IR where a source assignment occured. It +encodes the value of the variable. It references the store, if any, that +performs the assignment, and the destination address. + +The first three arguments are the same as for an `llvm.dbg.value`. The fourth +argument is a DIAssignID used to reference a store. The fifth is the +destination of the store (wrapped as metadata), and the sixth is a `complex +expression `_ that modfies it. + +See llvm/docs/InstrRefDebugInfo.md for more info. + Object lifetimes and scoping ============================