This is an archive of the discontinued LLVM Phabricator instance.

[IR] Remove the DIExpression field from DIGlobalVariable.
ClosedPublic

Authored by aprantl on Nov 16 2016, 2:58 PM.

Details

Summary

This patch implements PR31013 by introducing a DIGlobalVariableExpression that holds a pair of DIGlobalVariable and DIExpression.

Currently, DIGlobalVariables holds a DIExpression. This is not the best way to model this:

  1. The DIGlobalVariable should describe the source level variable, not how to get to its location.
  2. It makes it unsafe/hard to update the expressions when we call replaceExpression on the DIGLobalVariable.
  3. It makes it impossible to represent a global variable that is in more than one location (e.g., a variable with multiple DW_OP_piece-s). We also moved away from attaching the DIExpression to DILocalVariable for the same reasons.

A better representation would be to add a DIGlobalVariableExpression(var: !DIGlobalVariable(...), expr: !DIExpression(...)) that wraps a global variable and an expression. For example, let's say we have a global symbol g that doesn't need an expression — this would still be represented as:

@g = global i32 0, !dbg !0
!0 = DIGlobalVariable(name: "g", ...)

Later, after some transformation, this becomes:

@opt_g = global i32 *, !dbg !1
!0 = distinct !DIGlobalVariable(name: "g", ...) ; Identical to !0 above.
!1 = !DIGlobalVariableExpr(var: !0, expr: !2)   ; Not distinct.
!2 = !DIExpression(DW_OP_deref, ...)

This allows transformations to just add new mergeable metadata on top of the existing DIGlobalVariable without blowing up the representation if they aren't needed.
[reply] [-] Comment 1

For more discussion see https://llvm.org/bugs/show_bug.cgi?id=31013.

Diff Detail

Event Timeline

aprantl updated this revision to Diff 78269.Nov 16 2016, 2:58 PM
aprantl retitled this revision from to [IR] Remove the DIExpression field from DIGlobalVariable..
aprantl updated this object.
aprantl added reviewers: dexonsmith, dblaikie, pcc.
aprantl set the repository for this revision to rL LLVM.
aprantl added a subscriber: llvm-commits.

To review my own patch: I'll make another iteration replacing the MDNode * references to global variables with a more typesafe PointerUnion<GlobalVariable, GlobalVariableExpression>.

aprantl updated this revision to Diff 78544.Nov 18 2016, 9:32 AM
aprantl removed rL LLVM as the repository for this revision.

Improved type-safety by adding a DIGlobalVarExpr pointer union.

aprantl updated this revision to Diff 80685.Dec 7 2016, 4:12 PM
aprantl set the repository for this revision to rL LLVM.

Rebased on trunk, which most importantly should fix all reservations David had about DW_OP_piece in the driving example.

aprantl updated this revision to Diff 80688.Dec 7 2016, 4:32 PM
aprantl removed rL LLVM as the repository for this revision.

Now rebased the patch on https://reviews.llvm.org/D27550 which is what I really should have done. This fixes the expected output in the split-global.ll testcase.

aprantl updated this revision to Diff 80692.Dec 7 2016, 5:11 PM

One more bugfix for the split-global.ll testcase.

dblaikie added inline comments.Dec 9 2016, 10:23 AM
include/llvm/IR/DebugInfoMetadata.h
1047–1048

Usually best to make operators non-members where possible (helps with symmetric implicit conversions of LHS and RHS operands, for example). They can still be defined inline if you prefer - by declaring them as friends (which you probably want/need to do anyway)

include/llvm/IR/GlobalVariable.h
177

Is this union for backwards compatibility? (so things can still refer directly to a DIGlobalVariable?) If so, might be nice to push that compatibility earlier - map old debug info by creating new DIGlobalVariableExpressions, rather than letting the old semantics remain further through the system.

lib/Bitcode/Reader/BitcodeReader.cpp
2972–2975 ↗(On Diff #80692)

might be worth inverting this into a conditional operator

if (Attach) {
  Attach->addDebugInfo(Expr ? DGVE : DGV);
lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
2209

Not quite sure what this means/what the implications are?

lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
77

Perhaps this should be called Exprs (since some of the expressions won't themselves be globals in LLVM's sense - some might just be direct constants, etc)?

156
const auto &
195

Should finalize return the DIELoc instead of having it kept as a separate/parallel variable that DIEDwarfExpression is updating?

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
489–492

(reinforcing previous comment - would be good to do a full migration rather than carrying more variety in the IR format - or at least map to the new semantics at a lower level (at bitcode/IR reading time) rather than having all the code walking a this IR having to conditionalize like this)

lib/IR/DIBuilder.cpp
87–88
std::vector<Metadata*> Elts(AllGVs.size());
87–90

Oh - this might not be doing what you want (got test coverage?)

resize + push_back, means you'll have AllGVs.size() null pointers, then all the unwrapped elements after that.

Perhaps you meant reserve?

lib/IR/DebugInfo.cpp
55–56

early return to reduce indentation?

aprantl marked 8 inline comments as done.Dec 9 2016, 2:01 PM
aprantl added inline comments.
include/llvm/IR/GlobalVariable.h
177

The union (the canonical definition is in DebugInfoMetadata.h exists so we don't need to waste a DIGlobalVariableExpression node on global variables that don't need an DIExpression. This is an optimization that Duncan suggested (though I can't find where at the moment).

lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
2209

It's just something I noticed while updating the code. Could be a separate commit.

lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
195

Not sure I understand. You mean like

class DIEDwarfExpression {
...
  DIELoc finalize() {
    DwarfExpression::finalize();
    return DIE;
  }
}

...
addBlock(*VariableDIE, dwarf::DW_AT_location, DwarfExpr.finalize());
lib/CodeGen/AsmPrinter/DwarfDebug.cpp
489–492

Yeah not sure. Doing it this way saves a couple of pointers for global variables without expressions and makes the IR easier to read.

lib/IR/DIBuilder.cpp
87–90

whoa. Thanks!

aprantl updated this revision to Diff 80954.Dec 9 2016, 2:02 PM
aprantl marked 2 inline comments as done.

Address review feedback. Thanks!

aprantl updated this revision to Diff 81164.Dec 12 2016, 5:20 PM

Uploaded a patch that gets rid of the PointerUnion and updates all textual testcases to use DIGlobalVariableExpression.

dblaikie edited edge metadata.Dec 13 2016, 9:48 AM

It'd probably be nice for the upgraded tests to not use the inline metadata representation, but have another numbered node (just for consistency - because we don't have any DI metadata like that now, and I doubt we'll write it that way by hand in the future either, so it'll be out of place, etc (possibly making future auto upgrades more difficult by having more varied/odd/surprising representations to match/handle)). But I understand if that might be difficult to find a valid number for the newly introduced node.

lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
666

This seems strange - when/how would an imported entity refer to a DIGlobalVariableExpression? It should only ever refer to a DIGlobalVariable, right?

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
498–499

Wouldn't this leave the GVs from other CUs in the GVMap, causing them to be emitted into future CUs as well?

I guess the Processed check below avoids that? Could we avoid it by design in some other way, perhaps? For example when we go to insert something into the GVMap, if it's a new insertion we could emit it, otherwise we could ignore it? (I suppose this wouldn't work for the things that are put in the GVMap by the globals() loop above?)

Does the CU's global variable list currently contain all the globals (not just the constants?)? Then what's the need for the globals() loop above? To discover all instances of a global variable across CUs because they might all contribute to the location of the global in debug info?

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1656

Might be worth putting the type here instead of auto so it's clear it's a handle type that can be copied intentionally/cheaply/etc.

I just finished writing a script that gets rid of the inline representation for DIGlobalVariable. I put it into a separate review, because it causes a lot of distracting churn: https://reviews.llvm.org/D27765.

aprantl added inline comments.Dec 14 2016, 11:09 AM
lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
666

The example I had in mind was this:

namespace { // Force c to be local.
  namespace A {
    int c = 42;
  }
}
using A::c;

Unfortunately it doesn't look like this works at the moment. We're dropping the constant somewhere.

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
498–499

GVMap is used to collect all DIGlobalVariableExpressions that contribute to on global variable's location in the debug info. We need to iterate over all globals once to establish this mapping.

Then, for each global variable in the current CU we emit debug info for it and insert it into Processed to make sure we don't do it again.

I don't think that this two-step process can be shortened; let me know if you see a way that I missed.
But, I think that there is still an unhandled case, and that is a global variable with two constants in DIExpressions and no global symbol. I'll fix that and add a testcase.

Finally, to your last question: Yes, they are listed in the CU, but the global symbol is not reachable from the metadata; we refer to metadata via the !dbg attachment, not the other way round.

lib/Transforms/Instrumentation/AddressSanitizer.cpp
1656

I think this should just stay an auto *. This is a leftover from the PointerUnion days.

aprantl updated this revision to Diff 81433.Dec 14 2016, 11:57 AM
aprantl edited edge metadata.

Address outstanding issues.

aprantl updated this revision to Diff 81449.Dec 14 2016, 1:20 PM

It turns out, however, that the Verifier rejects imported DIGlobalVariableExpressions anyway, so I'm just removing that case.

aprantl updated this revision to Diff 81454.Dec 14 2016, 1:30 PM
aprantl marked 14 inline comments as done.

Address the comment about finalize() that I missed earlier.

dblaikie added inline comments.Dec 14 2016, 9:10 PM
lib/CodeGen/AsmPrinter/DwarfDebug.cpp
500

(terminology:
First loop: the "for globals" loop outside the CU loop. This one finds/associates GVEs on actual globals.
Second loop: the first "for global variables" loop inside the CU loop. This one finds any GVEs over constants (not associated with globals) - does the CU GV list not contain the GVEs associated with actual globals? (if it does contain those, then we might have duplicates in the first and second loops - which seems unfortunate)
Third loop: the second "for global variables" loop inside the CU loop - this one creates the global variables.

Should this be scoped outside of the CU loop?

(if the same global variable (eg: static in an inline function) is defined in multiple translation units - we only really need to emit it in one of them, not duplicate the description in all of them, right?)

Also - adding more things to the GVMap inside CU loop then processing the contents seems a bit strange - at best (& I think) benign, at worst one CU could leak something into another? (& likely somewhere in between: performance degredation by the final CU having many more things in the map than it needs, just making lookups slow, etc)

I think maybe the Processed set needs to move out of the CU loop, and we should remove entries from the GVMap in the Third Loop below (essentially migrating things out of the GVMap once they're handled - but recording that fact in the Processed (we can't just take absence as an indication because it might be absent because it's been optimized away to constants, so it wouldn't appear until the Second Loop - we shouldn't add things in the Second Loop if they're already in Processed, perhaps? Then we wouldn't need a conditional in the third loop?)

Oh, the 3rd loop could be over the GVMap, filtered by whether the unit is the current one. Which could be more efficient if there are many fragments of variables - which I guess is not teh common case, so the current pivot's probably better.

There do seem like an awful lot of lookups in many directions in these three loops. Hrm. :/

Maybe if we started off CU-specific? (the First Loop could create a map from CU -> GV -> GVEs, then you could do one lookup at the start of the cu loop, smaller lookups for the Second Loop (but probably still a second lookup for Processed - or this could be kept down in the Third Loop so we only needed to do it once per GV, instead of once per GVE in the CU), and the Third Loop could use no lookups at all, just loop over the GV->GVE mapping)

dblaikie accepted this revision.Dec 15 2016, 10:29 AM
dblaikie edited edge metadata.

Sounds good to me - If you want to check with Duncan too, up to you.

This revision is now accepted and ready to land.Dec 15 2016, 10:29 AM
This revision was automatically updated to reflect the committed changes.