This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label.
ClosedPublic

Authored by HsiangKai on Mar 29 2018, 6:54 AM.

Details

Summary

In order to set breakpoints on labels and list source code around labels, we need collect debug information for labels, i.e., label name, the function label belong, line number in the file, and the address label located. In order to keep these information in LLVM IR and to allow backend to generate debug information correctly. We create a new kind of metadata for labels, DILabel. The format of DILabel is

!DILabel(scope: !1, name: "foo", file: !2, line: 3)

We hope to keep debug information as much as possible even the code is optimized. So, we create a new kind of intrinsic for label metadata to avoid the metadata is eliminated with basic block. The intrinsic will keep existing if we keep it from optimized out. The format of the intrinsic is

llvm.dbg.label(metadata !1)

It has only one argument, that is the DILabel metadata. The intrinsic will follow the label immediately. Backend could get the label metadata through the intrinsic's parameter.

We also create DIBuilder API for labels to be used by Frontend. Frontend could use createLabel() to allocate DILabel objects, and use insertLabel() to insert llvm.dbg.label intrinsic in LLVM IR.

Diff Detail

Repository
rL LLVM

Event Timeline

HsiangKai created this revision.Mar 29 2018, 6:54 AM
aprantl requested changes to this revision.Mar 29 2018, 8:43 AM

Could you please elaborate what you want to use this new metadata node for? Adding a new metadata node has a pretty high maintenance cost, so I'd like to better understand what problem you want to solve. Depending on what you want to do there may be a more lightweight solution to your problem.

This revision now requires changes to proceed.Mar 29 2018, 8:43 AM

Ah. I found the thread on llvm-dev.

aprantl added inline comments.Mar 30 2018, 9:51 AM
include/llvm/IR/DebugInfoMetadata.h
2615

Why do we need to store the Line in the DILabel? Isn't it redundant with the DILocation !dbg attachment on the dbg.label intrinsic?

HsiangKai updated this revision to Diff 140626.Apr 2 2018, 7:59 AM
HsiangKai retitled this revision from [DebugInfo] Add DILabel metadata and create class DILabel. to [DebugInfo] Add DILabel metadata and intrinsic llvm.dbg.label..
HsiangKai edited the summary of this revision. (Show Details)

Restructure patches and add more test cases.

D45045 is the clang part of the implementation to support DILabel and llvm.dbg.label.

There should also be testcases that exercise the new verifier code.

include/llvm/IR/DebugInfoMetadata.h
2615

I don't think that it is necessary to store the line number in the DILabel. It is already stored in the !dbg DILocation attachment on the call.

lib/IR/AsmWriter.cpp
1978

I think file and line should just come from the DILocation. Is there a situation where the File in the DILabel would not match the one in the DILocation?

HsiangKai added inline comments.Apr 2 2018, 10:41 PM
lib/IR/AsmWriter.cpp
1978

The line number in DILocation is used for Instruction’s location. The line number is DILabel is used for Label’s location. Their meanings are different. They are just happened to be the same.

I reviewed llvm.dbg.declare implementation. It seems that the line number in DILocalVariable is also redundant. I am not very clear why there existed redundant line information in DILocalVariable. The only reason I guess is they have different meanings in logic and the implementation is easier if metadata has line number in itself. Actually, the line number of variable is got from variable’s metadata instead of derived from its intrinsic’s DILocation.

There may be other reasons I did not aware. However, if it is really redundant. I think there should be another patch to review the design in all metadata instead of in this patch. I prefer to be consistent with other metadata design concept.

chenwj added a comment.Apr 3 2018, 6:53 AM

Nitpick.

lib/IR/AsmWriter.cpp
1978

I agree with your last point. We can have another patch doing the cleanup, that makes things clear.

lib/IR/DIBuilder.cpp
711

Since you don't use Node after creating it, I think you can return DILabel::get directly.

aprantl added inline comments.Apr 3 2018, 9:01 AM
lib/IR/AsmWriter.cpp
1978

I reviewed llvm.dbg.declare implementation. It seems that the line number in DILocalVariable is also redundant. I am not very clear why there existed redundant line information in DILocalVariable. The only reason I guess is they have different meanings in logic and the implementation is easier if metadata has line number in itself. Actually, the line number of variable is got from variable’s metadata instead of derived from its intrinsic’s DILocation.

For llvm.dbg.value, the line number stored in DILocalVariable is used to set the DW_AT_decl_line of the variable information in the .debug_info section. The DILocation of the various intrinsic calls that describe locations for this variable is currently unused except for the inliendAt: part. The line number in the DILocalVariable is needed for variables that are entirely optimized away: in this case no debug info intrinsic is left and the DILocationVariable is only reachable via the DISubprogram's list of local variables.

In the case of DILabel, I don't think there is any value in keeping around labels that were entirely optimized away, so there is no harm in depending on the information stored in the DILocation attachment of the intrinsic. DO you think this is reasonable?

There may be other reasons I did not aware. However, if it is really redundant. I think there should be another patch to review the design in all metadata instead of in this patch. I prefer to be consistent with other metadata design concept.

I disagree with this approach, both for the reasons outlined above, and because for every metadata change we also need to implement a bitcode upgrade in metadataloader which we need to carry around for an indefinite time.

probinson added inline comments.
lib/IR/AsmWriter.cpp
1978

If a label is optimized away, do we still emit the DW_TAG_label? If so, then the source location should be kept in the DILabel and not rely on the intrinsic. Otherwise the DWARF will have a DW_TAG_label with a name and no other information at all.

aprantl added inline comments.Apr 5 2018, 1:46 PM
lib/IR/AsmWriter.cpp
1978

My understanding was that in the latest revision of the design we don't have anywhere to attach the DILabel to when it is optimized away. If we do care about DILabels that have been optimized away (I don't see why we would want to) we would have to add a list of labels to the DISubprogram similar to how we keep dead variables around. But I don't htink that that is worth it.

My understanding was that in the latest revision of the design we don't have anywhere to attach the DILabel to when it is optimized away. If we do care about DILabels that have been optimized away (I don't see why we would want to) we would have to add a list of labels to the DISubprogram similar to how we keep dead variables around. But I don't htink that that is worth it.

Hmm if I compile a function containing 3 labels and the debugger knows about only 2, I might think the compiler has a bug.
We keep dead variables around so we can correctly report they have been optimized away. Thinking ahead to supporting languages where labels are far more common than in C/C++ (for example I know the OpenVMS guys will want to support COBOL) I think a labels list might be appropriate.

But OpenVMS will likely need other stuff, so if you would rather wait on a labels list until that happens, that's fine.

My understanding was that in the latest revision of the design we don't have anywhere to attach the DILabel to when it is optimized away. If we do care about DILabels that have been optimized away (I don't see why we would want to) we would have to add a list of labels to the DISubprogram similar to how we keep dead variables around. But I don't htink that that is worth it.

Hmm if I compile a function containing 3 labels and the debugger knows about only 2, I might think the compiler has a bug.
We keep dead variables around so we can correctly report they have been optimized away. Thinking ahead to supporting languages where labels are far more common than in C/C++ (for example I know the OpenVMS guys will want to support COBOL) I think a labels list might be appropriate.

In the end I'm fine with both variants, but if we go for the line field in the DILabel I think we should also implement support for the labels list in DISubprogram or at least document our intention to add this clearly.

Rather than adding another list (if/when it comes to that) - we might want
to generalize the existing list to contain any/all local entities.

My understanding was that in the latest revision of the design we don't have anywhere to attach the DILabel to when it is optimized away. If we do care about DILabels that have been optimized away (I don't see why we would want to) we would have to add a list of labels to the DISubprogram similar to how we keep dead variables around. But I don't htink that that is worth it.

Hmm if I compile a function containing 3 labels and the debugger knows about only 2, I might think the compiler has a bug.
We keep dead variables around so we can correctly report they have been optimized away. Thinking ahead to supporting languages where labels are far more common than in C/C++ (for example I know the OpenVMS guys will want to support COBOL) I think a labels list might be appropriate.

In the end I'm fine with both variants, but if we go for the line field in the DILabel I think we should also implement support for the labels list in DISubprogram or at least document our intention to add this clearly.

I agree. I think the label information should be kept even intrinsic are optimized out such that users could query source code around labels. I will create the labels list in DISubprogram and resend the patch. Thanks.

Probably best not to create another list on the DISubprogram - you might be
able to generalize the existing list to contain "anything that might be
optimized out"

Katya (CC'd), over at Sony, has been looking at a similar issue/possible
change to move function-local using declarations/directives into such a
list as well.

Probably best not to create another list on the DISubprogram - you might be
able to generalize the existing list to contain "anything that might be
optimized out"

Katya (CC'd), over at Sony, has been looking at a similar issue/possible
change to move function-local using declarations/directives into such a
list as well.

Good idea. I will try to figure out how to do it. If it is possible to combine all local
entities into one list, I will do it.

Probably best not to create another list on the DISubprogram - you might be
able to generalize the existing list to contain "anything that might be
optimized out"

Katya (CC'd), over at Sony, has been looking at a similar issue/possible
change to move function-local using declarations/directives into such a
list as well.

Good idea. I will try to figure out how to do it. If it is possible to combine all local
entities into one list, I will do it.

Sounds good.

HsiangKai updated this revision to Diff 142000.Apr 11 2018, 7:17 AM

Add DILabel list to DISubprogram. The new DISubprogram will be

DISubprogram(..., variables: !7, labels: !8)

To maintain all local entities of the function in one list is not trivial modification. I think it should not relate to label debugging. The refactoring of these lists into one should be another patch.

So, to enable label debugging, I use a separate list to maintain labels in the function. I add a test case in D45045 under Clang repo.

I realize it's non-trivial to keep these all in the same list - but I'd
rather not create a new list only to remove it again later if it can be
helped. What particular issues did you encounter when attempting to use a
single list?

HsiangKai updated this revision to Diff 142530.Apr 14 2018, 6:55 PM

Merge label metadata list into variable metadata list. The new metadata list is called 'elements' list. I do not merge other metadata lists into this one.

Update test case.

I realize it's non-trivial to keep these all in the same list - but I'd
rather not create a new list only to remove it again later if it can be
helped. What particular issues did you encounter when attempting to use a
single list?

I have merged labels metadata list into variables metadata list.

Looking pretty good so far, a couple more details inline.

include/llvm/IR/DebugInfoMetadata.h
1614

Elements is a bit too nondescript. Can we call it RetainedNodes or RetainedIntrinsics (like the RetainedTypes in DICompileUnit)?

2615

This is fine now, because we retain the labels in the subprogram.

include/llvm/IR/IntrinsicInst.h
178

There's a way to express this is doxygen:

/// Methods for support type inquiry through isa, cast, and dyn_cast:
/// @{
...
/// @}
lib/AsmParser/LLParser.cpp
4346

same here: maybe retainedNodes: or retainedIntrinsics:

lib/IR/DIBuilder.cpp
719

I don't understand this comment. Shouldn't it say "append it to the list of retained nodes of the DISubprogram"?

lib/IR/LLVMContextImpl.h
969

Do you think there is value in hashing the name and file an line? It seems like name+line or file+line should already be mostly unique and the comparison function will take care of collisions.

test/DebugInfo/Generic/debug-label-bitcode.ll
1 ↗(On Diff #142530)

bitcode reader tests should go into test/Assembler and should do a double-roundtrip:

; RUN: llvm-as < %s | llvm-dis | llvm-as | llvm-dis | FileCheck %s
; RUN: verify-uselistorder %s

HsiangKai marked 7 inline comments as done.
  • Rename elements in DISubprogram to retainedNodes.
  • Update test cases.
  • Update according to review comments.
nhaehnle removed a subscriber: nhaehnle.Apr 20 2018, 4:56 AM
aprantl accepted this revision.Apr 20 2018, 8:03 AM

Thanks! Two more comments inline, LGTM otherwise.

lib/IR/Verifier.cpp
1094

Let's be specific: "expected DILocalVariable or DILabel."

4595

Comment should go above the line.

This revision is now accepted and ready to land.Apr 20 2018, 8:03 AM
HsiangKai marked 4 inline comments as done.
  • Update according to review comments.

I found a few more nitpicks, but my overall LGTM still stands.

include/llvm/IR/DebugInfoMetadata.h
2607

This just repeats the name of the class.
How about something like: This represents a source-level label, such as a C goto label.

include/llvm/IR/IntrinsicInst.h
175

llvm_unreachable() ?

test/Assembler/debug-label-bitcode.ll
55

I think for all other constructs we also have negative testcases that verify that
!DILabel(name: "done", file: !1, line: 7)
!DILabel(scope: !4, file: !1, line: 7)
!DILabel(scope: !4, name: "done", line: 7)
!DILabel(scope: !4, name: "done", file: !1)

are rejected.

This revision was automatically updated to reflect the committed changes.