This is an archive of the discontinued LLVM Phabricator instance.

IR: Move the complex address expression field out of DIVariable and into an extra argument of the dbg.declare/dbg.value intrinsics.
ClosedPublic

Authored by aprantl on Aug 14 2014, 4:54 PM.

Details

Reviewers
dblaikie
echristo
Summary

Here's a wild idea:

Currently, DIVariable is a variable-length field that has an optional reference to a Metadata array consisting of a variable number of complex address expressions. In the case of OpPiece expressions this is wasting a lot of storage in IR, because when an aggregate type is, e.g., SROA'd into all of its n individual members, the IR will contain n copies of the DIVariable, all alike, only differing in the complex address reference at the end.

By making the complex address into an extra argument of the dbg.value/dbg.declare intrinsics, all of the pieces can reference the same variable and the complex address expressions can be uniqued across the CU, too.
Down the road, this will allow us to move other flags, such as "indirection" out of the DIVariable, too.

The new intrinsics look like this:
declare void @llvm.dbg.declare(metadata %storage, metadata %expr, metadata %var)
declare void @llvm.dbg.value(metadata %storage, i64 %offset, metadata %expr, metadata %var)

This patch adds a unique offset to the values of ComplexAddrKind that doubles as a tag, so we can detect and pretty-print DIExpression metadata nodes.

What this patch doesn't do:

  • This patch does not touch the "Indirect" field in DIVariable; but moving that into the expression would be a natural next step.

Diff Detail

Event Timeline

aprantl updated this revision to Diff 12535.Aug 14 2014, 4:54 PM
aprantl retitled this revision from to IR: Move the complex address expression field out of DIVariable and into an extra argument of the dbg.declare/dbg.value intrinsics..
aprantl updated this object.
aprantl edited the test plan for this revision. (Show Details)
aprantl set the repository for this revision to rL LLVM.
aprantl added a project: deleted.
aprantl added subscribers: Unknown Object (MLST), dblaikie, echristo.
aprantl updated this revision to Diff 12719.Aug 20 2014, 4:52 PM
aprantl updated this object.

This updated version of the patch passes all tests.
Still outstanding:

  • Auto-upgrading old-style intrinsics
  • DIExpression tagging (for pretty-dumping) My plan is to add an offset to the ComplexAddrKind enum values (DW_TAG_user_low?) so DIExpressions can be recognized.
In D4919#6, @aprantl wrote:

This updated version of the patch passes all tests.
Still outstanding:

  • Auto-upgrading old-style intrinsics

Do we need this? We have the auto-upgrader stripping debug info already, don't we? Does it not strip the intrinsics? I assume we should just strip them if we aren't already.

  • DIExpression tagging (for pretty-dumping) My plan is to add an offset to the ComplexAddrKind enum values (DW_TAG_user_low?) so DIExpressions can be recognized.

I'm still really on the fence about the whole annotated metadata support... it really is just getting lucky about certain integer prefixes. Is there something less likely to collide that we could/should use?

If we do continue to decide just to use integers, yeah, I suppose we could add a number to that enum. I'd probably generate a random number for the task (& write some clear comments about why that is). DW_TAG_user_low would just be confusing two mostly unrelated things for no particular benefit (I think it'd be confusing - it's like doing arithmetic with differently dimensioned quantities).

  • David

If we also bump the metadata version number with this patch, the debug info will be stripped. Originally I thought that AsmParser would still throw an error because the intrinsics are missing an argument, but this is not the case, I just verified that. So this is good, that removes the autoupgrade requirement!

My rationale for DW_TAG_user_lo is that this way we can guarantee that all DIDescriptor tags (except line table entries) are within the range [DW_TAG_array_type | LLVM_DebugVersion .. DW_TAG_user_hi|LLVM_DebugVersion].

My current suggestion is to use the end of the DW_TAG_user range for this purpose:

enum ComplexAddrKind {
  OpFirst = dwarf::DW_TAG_hi_user - dwarf::DW_OP_hi_user,
  OpPlus  = OpFirst + dwarf::DW_OP_plus,
  OpDeref = OpFirst + dwarf::DW_OP_deref,
  OpPiece = OpFirst + dwarf::DW_OP_piece,
  OpLast = OpPiece
};
In D4919#10, @aprantl wrote:

If we also bump the metadata version number with this patch, the debug info will be stripped. Originally I thought that AsmParser would still throw an error because the intrinsics are missing an argument, but this is not the case, I just verified that. So this is good, that removes the autoupgrade requirement!

Excellent!

My rationale for DW_TAG_user_lo is that this way we can guarantee that all DIDescriptor tags (except line table entries) are within the range [DW_TAG_array_type | LLVM_DebugVersion .. DW_TAG_user_hi|LLVM_DebugVersion].

In what way is this beneficial? I suppose it helps us reserve different namespaces for different metadata annotations. Are we annotating any other metadata? How are we identifying that metadata? If we are, I assume we'd want an overall scheme, but I doubt we are.

My current suggestion is to use the end of the DW_TAG_user range for this purpose:

enum ComplexAddrKind {
  OpFirst = dwarf::DW_TAG_hi_user - dwarf::DW_OP_hi_user,
  OpPlus  = OpFirst + dwarf::DW_OP_plus,
  OpDeref = OpFirst + dwarf::DW_OP_deref,
  OpPiece = OpFirst + dwarf::DW_OP_piece,
  OpLast = OpPiece
};

The reason I shy away from this is that I think, as I mentioned, it confuses what these values are - they're not tags, they're just identifiers. (I mean I suppose the same is true of DW_TAG_arg_variable/param_variable or whatever it is we added that are non-standard 'tags' - but that's a bit more justified because they're in the same part of the schema - they have to be unique there/in the same range)

aprantl updated this revision to Diff 12820.Aug 21 2014, 5:43 PM
aprantl updated this object.

Updated patch with updated ComplexAddrKind enum values and bumped version number.

In D4919#11, @dblaikie wrote:
In D4919#10, @aprantl wrote:

If we also bump the metadata version number with this patch, the debug info will be stripped. Originally I thought that AsmParser would still throw an error because the intrinsics are missing an argument, but this is not the case, I just verified that. So this is good, that removes the autoupgrade requirement!

Excellent!

My rationale for DW_TAG_user_lo is that this way we can guarantee that all DIDescriptor tags (except line table entries) are within the range [DW_TAG_array_type | LLVM_DebugVersion .. DW_TAG_user_hi|LLVM_DebugVersion].

In what way is this beneficial? I suppose it helps us reserve different namespaces for different metadata annotations.

Exactly!

Are we annotating any other metadata? How are we identifying that metadata? If we are, I assume we'd want an overall scheme, but I doubt we are.

Currently we aren't (in fact, TBAA is the *only* other kind of metadata that I'm aware of), but I think it would be nice if we could in the future.
The -blocks testcases are a nice example of just how useful having a human-readable comment can be.

If you feel that using the DW_TAG_user range has too much of a potential for conflicts, we could also allocate an adjacent range.

My current suggestion is to use the end of the DW_TAG_user range for this purpose:

enum ComplexAddrKind {
  OpFirst = dwarf::DW_TAG_hi_user - dwarf::DW_OP_hi_user,
  OpPlus  = OpFirst + dwarf::DW_OP_plus,
  OpDeref = OpFirst + dwarf::DW_OP_deref,
  OpPiece = OpFirst + dwarf::DW_OP_piece,
  OpLast = OpPiece
};

The reason I shy away from this is that I think, as I mentioned, it confuses what these values are - they're not tags, they're just identifiers. (I mean I suppose the same is true of DW_TAG_arg_variable/param_variable or whatever it is we added that are non-standard 'tags' - but that's a bit more justified because they're in the same part of the schema - they have to be unique there/in the same range)

dblaikie added inline comments.Aug 26 2014, 3:52 PM
include/llvm/CodeGen/MachineModuleInfo.h
395

I'd probably put the Var at the start - sort of the broadest thing first. But it's not a strong preference & certainly no great argument from consistency. (& I won't repeat this feedback for all the various similar APIs that have the same ordering choice)

include/llvm/IR/DebugInfo.h
722

Does this benefit/need to be a DIDescriptor? I suppose in dumping code it might be handy (?) but generally we're never going to have a generic DIDescriptor and have to ask if it's a DIExpression. There's only one place we'll find them while navigating the debug info metadata - and that one place there won't be anything else we expect to find.

include/llvm/IR/Metadata.h
34 ↗(On Diff #12820)

Did we ever figure out the answer to rolling the version number? I assume this is one of the reasons you have so much test fallout - is there any way we can commit this change separately? (I realize that'll leave a revision or two that don't make sense) just from a review-ability perspective? (not that I've looked over the tests - but it'd possibly be nicer to be able to at least eyeball a few that actually have schema changes versus those that just need the revision number updated)

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
619

This looks like cleanup (could be a separate, precursor patch), no?

lib/IR/DIBuilder.cpp
1087

What's this for? (I haven't seen any callers)

If needed, I'd just implement it by having the above function default the argument to None:

createExpression(Arrayref<Value *> Addr = None)

In D4919#11, @dblaikie wrote:
In D4919#10, @aprantl wrote:

If we also bump the metadata version number with this patch, the debug info will be stripped. Originally I thought that AsmParser would still throw an error because the intrinsics are missing an argument, but this is not the case, I just verified that. So this is good, that removes the autoupgrade requirement!

Excellent!

My rationale for DW_TAG_user_lo is that this way we can guarantee that all DIDescriptor tags (except line table entries) are within the range [DW_TAG_array_type | LLVM_DebugVersion .. DW_TAG_user_hi|LLVM_DebugVersion].

In what way is this beneficial? I suppose it helps us reserve different namespaces for different metadata annotations.

Exactly!

Are we annotating any other metadata? How are we identifying that metadata? If we are, I assume we'd want an overall scheme, but I doubt we are.

Currently we aren't (in fact, TBAA is the *only* other kind of metadata that I'm aware of), but I think it would be nice if we could in the future.
The -blocks testcases are a nice example of just how useful having a human-readable comment can be.

If you feel that using the DW_TAG_user range has too much of a potential for conflicts, we could also allocate an adjacent range.

My current suggestion is to use the end of the DW_TAG_user range for this purpose:

enum ComplexAddrKind {
  OpFirst = dwarf::DW_TAG_hi_user - dwarf::DW_OP_hi_user,
  OpPlus  = OpFirst + dwarf::DW_OP_plus,
  OpDeref = OpFirst + dwarf::DW_OP_deref,
  OpPiece = OpFirst + dwarf::DW_OP_piece,
  OpLast = OpPiece
};

The reason I shy away from this is that I think, as I mentioned, it confuses what these values are - they're not tags, they're just identifiers. (I mean I suppose the same is true of DW_TAG_arg_variable/param_variable or whatever it is we added that are non-standard 'tags' - but that's a bit more justified because they're in the same part of the schema - they have to be unique there/in the same range)

include/llvm/CodeGen/MachineModuleInfo.h
395

Whichever way we go, we _have_ to be consistent. The fact that both are MDNode pointers is just asking for bugs.
I went with this order to be consistent with the intrinsic
llvm.dbg.value(%val, i64 offset, !metadata Var)
which I extended to
(1) llvm.dbg.value(%val, i64 offset, !metadata Expr, !metadata Var)

If we go with your suggestion, it would be better to also change the intrinsic to
(2) llvm.dbg.value(%val, i64 offset, !metadata Var, !metadata Expr)
but that is inconsistent with the position of offset, unless we change it to
(3) llvm.dbg.value(%val, !metadata Var, i64 offset, !metadata Expr)

[That said, one day in the future we may want to roll the offset into Expr, so maybe we can skip (3) as an intermediate step.]

My preference is (1) simply because it is consistent with the previous intrinsic.

include/llvm/IR/DebugInfo.h
722

You answered that question yourself :-)
Yes, it is necessary for dump(), and dump() is essential for the testsuite.

include/llvm/IR/Metadata.h
34 ↗(On Diff #12820)

IMO we have to bump the version number together with a change that changes the metadata layout. Otherwise, if we had
r1 - base
r2 - update metadata
r3 - bump version

users that LTO a module compiled with r1 and a module compiled with r2 will get an assertion/crash. Now, you might argue that we don't need to cater for users living off trunk, but it would also be hard to tell which commits warrant a version bump when, e.g, cherry-picking to an llvm-release branch. Whatever consensus we reach, please let's document it somewhere :-)

lib/CodeGen/AsmPrinter/AsmPrinter.cpp
619

Agreed!

lib/IR/DIBuilder.cpp
1087

Ah, that does look much better, yes! Thanks.

aprantl updated this revision to Diff 13609.Sep 11 2014, 3:51 PM

This should address all previous comments.
The biggest changes are:

  • DIExpressions now come _after_ DIVariables, in all function signatures, everywhere.
  • DIExpressions now have their own LLVM-internal DW_TAG_expression to facilitate pretty-printing and verification.
  • More dynamic type-safety (is that even a thing?) assertions.
  • Got rid of the debug version number bump. I found the last discussion and we agreed to only bump it for point releases.

Permissions change on test/BugPoint/compile-custom.ll, test/MC/COFF/linker-options.ll, test/Verifier/inalloca-vararg.ll - what's that about?

These tests would probably be easier to review if we made a separate change to add a no-op 4th operand to the intrinsic, and committed that along with the test case updates necessary - then did the change to catually move stuff. But I'm not sure if that's worthwhile - just a thought, not a request.

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
618

How come this turned into an explicit special case? I assume in the old code this was handled by the code on 618 (in the before source)?

test/DebugInfo/X86/autoupgrade-intrinsics.s
1 ↗(On Diff #13609)

What's this file for? Is it referenced from some test?

aprantl updated this revision to Diff 13812.Sep 17 2014, 5:53 PM
aprantl removed subscribers: echristo, dblaikie.
  • clean up & rebase patch (get rid of extra files / unintended permission changes)
  • fix a bug in variableHasComplexAddress() that removed the need for special-casing indirect variables in addComplexAddress(). Thanks, David!
mcrosier removed a subscriber: mcrosier.Sep 18 2014, 6:41 AM
In D4919#20, @dblaikie wrote:

These tests would probably be easier to review if we made a separate change to add a no-op 4th operand to the intrinsic, and committed that along with the test case updates necessary - then did the change to catually move stuff. But I'm not sure if that's worthwhile - just a thought, not a request.

I'm not yet convinced this would make reviewing the test cases that much easier: For the majority of the testcases (those that don't deal with complex address expressions) the 4th operand is a nullptr anyway, and for the remainder the second diff will be just as complex as before?

dblaikie edited edge metadata.Sep 18 2014, 9:46 AM
In D4919#26, @aprantl wrote:
In D4919#20, @dblaikie wrote:

These tests would probably be easier to review if we made a separate change to add a no-op 4th operand to the intrinsic, and committed that along with the test case updates necessary - then did the change to catually move stuff. But I'm not sure if that's worthwhile - just a thought, not a request.

I'm not yet convinced this would make reviewing the test cases that much easier: For the majority of the testcases (those that don't deal with complex address expressions) the 4th operand is a nullptr anyway, and for the remainder the second diff will be just as complex as before?

Yep, that's mostly my point - getting all the trivial cases migrated over makes it easier to pay attention to the complex ones - scrolling through all the trivial cases trying to spot the interesting ones is likely not to produce very good attention to detail. Not a lot of signal/noise.

Not a big deal either way, though, just a thought.

aprantl updated this revision to Diff 13881.Sep 19 2014, 11:57 AM
aprantl updated this object.
aprantl edited edge metadata.

This revision adds auto-upgrading for old-style dbg.declare and dbg.value intrinsics.
This should be the last missing piece.

dblaikie added inline comments.Sep 29 2014, 2:52 PM
include/llvm/IR/DebugInfo.h
756

When would this ever be zero? Can we just assert that it's not?

(I assume it should be at least 2 - since the expression has to contain some operations, presumably?)

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1394

I'd probably just create a DIExpression() inline rather than having a separate variable (when I see it written as a separate variable I wonder if it's an unused out parameter or somesuch). Up to you.

Or consider providing a different ctor for abstract variables that just doesn't take a DIExpression anyway? Not sure.

Thanks!
I'll fix these two things up before committing. Do you have any other comments that should block committing?

include/llvm/IR/DebugInfo.h
756

I believe this may be a leftover from when DIExpression did not have its own TAG. We should just assert that N>0.

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1394

Sure.

dblaikie accepted this revision.Sep 29 2014, 4:16 PM
dblaikie edited edge metadata.
In D4919#34, @aprantl wrote:

Thanks!
I'll fix these two things up before committing. Do you have any other comments that should block committing?

Nope, don't think so - we'll figure it out in tree as we see how it all settles out, I'd imagine.

This revision is now accepted and ready to land.Sep 29 2014, 4:16 PM
aprantl closed this revision.Oct 3 2014, 5:31 PM