This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Enable debug information for C99 VLA types
ClosedPublic

Authored by sdesmalen on Jan 3 2018, 3:04 AM.

Details

Summary

This patch enables debugging of C99 VLA types by generating more precise
LLVM Debug metadata, using the extended DISubrange 'count' field that
takes a DIVariable.

This should implement:

Bug 30553: Debug info generated for arrays is not what GDB expects (not as good as GCC's)

https://bugs.llvm.org/show_bug.cgi?id=30553

Diff Detail

Repository
rL LLVM

Event Timeline

sdesmalen created this revision.Jan 3 2018, 3:04 AM
fhahn added a subscriber: fhahn.Jan 3 2018, 3:15 AM

It would be awesome if you could also add an end-to-end test to the debuginfo-tests repository so we can verify that this actually works in LLDB and GDB.

lib/CodeGen/CGDebugInfo.cpp
2358 ↗(On Diff #128508)

perhaps write sizeNode/Count to a variable, so you don't have to duplicate the expression?

3473 ↗(On Diff #128508)

Could this function be replaced by a default argument instead?

lib/CodeGen/CGDebugInfo.h
86 ↗(On Diff #128508)

I'm expecting VLA's to not be very common, should we use a regular DenseMap instead?

317 ↗(On Diff #128508)

This is performing the lookup twice. If you use .find() instead it will be more efficient. We also don't use accessor functions like this for any of the other caches. If you think that they help, perhaps make this more generic and useful for all caches?

404 ↗(On Diff #128508)

do we need the qualifier on Optional?

lib/CodeGen/CGDecl.cpp
1125 ↗(On Diff #128508)

Please move this code into a member function of CGDebugInfo.

1137 ↗(On Diff #128508)

.

lib/CodeGen/CodeGenFunction.cpp
1963 ↗(On Diff #128508)

return {VlaSize, Vla->getElementType()};

davide added a subscriber: davide.Jan 3 2018, 2:37 PM
sdesmalen updated this revision to Diff 129943.Jan 16 2018, 6:08 AM

Refactoring based on @aprantl's suggestions.

sdesmalen marked 3 inline comments as done.Jan 16 2018, 6:10 AM
sdesmalen added inline comments.
lib/CodeGen/CGDebugInfo.cpp
2358 ↗(On Diff #128508)

Unfortunately this is necessary because SizeNode and Count are different types, and therefore map to a different function signature of getOrCreateSubrange()

lib/CodeGen/CGDebugInfo.h
86 ↗(On Diff #128508)

Is that not the argument to use a SmallDenseMap instead? The documentation for DenseMap says:
"Also, because DenseMap allocates space for a large number of key/value pairs (it starts with 64 by default), it will waste a lot of space if your keys or values are large."

There is no documentation for SmallDenseMap, but the name suggests it would be optimized for maps with only a few values?

317 ↗(On Diff #128508)

Thanks, I've now removed the accessor function and replaced its uses with the SizeExprCache.find() method as you suggested.

lib/CodeGen/CGDecl.cpp
1125 ↗(On Diff #128508)

I moved this to a separate method in CodeGenFunction instead, since it also has to create the actual store (next to creating the DebugInfo), which I don't think should be in CGDebugInfo. Let me know what you think.

lib/CodeGen/CodeGenFunction.cpp
1963 ↗(On Diff #128508)

Thanks for the suggestion, that looks much better!

aprantl added inline comments.Jan 16 2018, 10:15 AM
lib/CodeGen/CGDebugInfo.h
86 ↗(On Diff #128508)

I think you're right. This SmallDenseMap is not being allocated on the stack, so it is the right tradeoff here.

Just a gentle reminder that this patch still needs to be accepted (the LLVM support for it has been merged).

aprantl added inline comments.Jan 26 2018, 9:31 AM
lib/CodeGen/CGDebugInfo.h
395 ↗(On Diff #129943)

I think

llvm::DILocalVariable* EmitDeclareOfAutoVariable(const VarDecl *Decl, llvm::Value *AI,
                                 CGBuilderTy &Builder);

would be more natural.

474 ↗(On Diff #129943)

same here. Why not just use a return value?

lib/CodeGen/CGDecl.cpp
958 ↗(On Diff #129943)

LLVM coding style wants this comment to be on the member function declaration in the header file and only there. Also we don't repeat the function name in the comment any more.

984 ↗(On Diff #129943)

s/'fake'/artificial/

990 ↗(On Diff #129943)

I think it does, but can you assert me that this generates the same code with and without -g ?

lib/CodeGen/CodeGenFunction.h
2198 ↗(On Diff #129943)

How about defining a struct with named members for improved readability? I think you can still use the return {a, b} syntax.

sdesmalen updated this revision to Diff 131990.Jan 30 2018, 9:51 AM
sdesmalen marked 4 inline comments as done.
  • Changed return type of getVLASize() to a struct with named members.
  • EmitDeclare and EmitDeclareOfAutoVariable now return a DILocalVariable* rather than it being returned in a pointer that was passed as default argument.
sdesmalen added inline comments.Jan 30 2018, 9:53 AM
lib/CodeGen/CGDebugInfo.h
474 ↗(On Diff #129943)

Initially I thought it would make more sense for the EmitDeclare function to return the actual call instruction to llvm.dbg.declare, but I see that returning DILocalVariable* makes more sense and that this is in line with the other methods of this class.

lib/CodeGen/CGDecl.cpp
990 ↗(On Diff #129943)

I'm not really sure what you mean with 'generates the same code', because without -g this function is not invoked? However, this function only affects debug information, not anything else related to the actual creation of the VLA. So without -g, no 'dbg.declare()' or corresponding DILocalVariables are generated for each subexpression of each dimension.

aprantl added inline comments.Jan 30 2018, 10:02 AM
lib/CodeGen/CGDecl.cpp
990 ↗(On Diff #129943)

What I meant is that regardless of code being compile with or without -g, the .text section of the resulting binary is identical.

aprantl accepted this revision.Jan 30 2018, 10:03 AM
This revision is now accepted and ready to land.Jan 30 2018, 10:03 AM
sdesmalen added inline comments.Jan 31 2018, 4:48 AM
lib/CodeGen/CGDecl.cpp
990 ↗(On Diff #129943)

Actually, before I commit this patch, I just realize this is not the case. At least not with -O0 since it creates a stackobject to store the vla-size subexpression, which is referenced by the dbg.declare(). We can assume this store/alloca to be DCE'd at optimization levels higher than O0 so it should not have any impact, but strictly taken the .text section will not be the same with and without -g at -O0.

aprantl requested changes to this revision.Jan 31 2018, 8:51 AM

The correct way to fix this is to generate the stack object unconditionally (it will get optimized away by mem2reg when optimizations are enabled) and only emit the dbg.declare when -g is present.

This revision now requires changes to proceed.Jan 31 2018, 8:51 AM
sdesmalen updated this revision to Diff 132275.Jan 31 2018, 2:16 PM
  • Now always emit an alloca for a VLA dimension expression (regardless of whether -g is passed).
  • Fixed up some more tests since it now also triggers for all tests with variable length arrays that don't pass -g.
  • Refactored EmitAndRegisterVariableArrayDimensions into two phases (one phase to create the alloca, another to create the llvm.dbg.declare call).
aprantl accepted this revision.Jan 31 2018, 2:27 PM
aprantl added inline comments.
lib/CodeGen/CGDecl.cpp
994 ↗(On Diff #132275)

perhaps factor this out into a variable for readability?

This revision is now accepted and ready to land.Jan 31 2018, 2:27 PM
sdesmalen marked an inline comment as done.Feb 1 2018, 3:26 AM

Thanks for your review @aprantl!

This revision was automatically updated to reflect the committed changes.
This revision was automatically updated to reflect the committed changes.

Hi @sdesmalen!

First of all my apologies for commenting after the issue has been closed, but I do not have an account to add a comment to the associated bugzilla.

I have found what it seems to be an issue with the current implementation.

For the given test case

int main() {
  int size = 2;

  int vla_expr[size];
  vla_expr[1] = 1;

  return 0;
}

and while debugging with LLDB, the following error is generated:

(lldb) n
Process 21014 stopped
* thread #1, name = 'bad.out', stop reason = step over
    frame #0: 0x0000000000400502 bad.out`main at vla_2.cpp:7
   4   	  int vla_expr[size];
   5   	  vla_expr[1] = 1;
   6   	
-> 7   	  return 0;
   8   	}

(lldb) p vla_expr
(unsigned long) $0 = 2

(lldb) p vla_expr[1]
error: subscripted value is not an array, pointer, or vector

(lldb)

Looking at the DWARF generated, there are 2 variables with the same name at the same scope

DW_TAG_subprogram "main"
  ...
  DW_TAG_variable "size"
  DW_TAG_variable "vla_expr"
  DW_TAG_variable "vla_expr"

I think there are 2 issues:

The compiler generated variable 'vla_expr'

  • should be flagged as artificial (DW_AT_artificial)
  • its name should start with double underscore to avoid conflicting with user-defined names.

Thanks,
Carlos

Hi @sdesmalen!

For the following test case

int main() {
  int size = 2;

  int var[size];
  var[1] = 1;

  return 0;
}

I compared the DWARF generated by GCC and it looks like

DW_TAG_variable "var"
  DW_AT_location ...
  DW_AT_type DW_FORM_ref4
    DW_TAG_array_type 
      DW_AT_type -> "int"
      DW_TAG_subrange_type 
        DW_AT_type -> "sizetype"
        DW_AT_upper_bound DW_FORM_exprloc [4] = { DW_OP_fbreg 0xffffffb8 DW_OP_deref }

GCC use DW_AT_upper_bound with an associated location expression to describe the VLA boundaries.

In order to reduce the side effects created by the artifical-variable as described in my previous comment and to keep the generated DWARF within a reasonable size, I would suggest the GCC aproach as a size optimization.

The DWARF description of the artificial-variable could be removed and its location expression used by the array's subrange_type, instead of the subrange_type making a reference to the artificial-variable.