Debug info: Support fragmented variables.
ClosedPublic

Authored by aprantl on Feb 3 2014, 3:16 PM.

Details

Reviewers
chandlerc
aprantl
Summary

Reposting in Phabricator as requested.

Currently, LLVM cannot represent locations for aggregate variables that
are fragmented across multiple Values. This situation arises, e.g.,
during SROA, or even as part of the x86_64 calling convention when a struct
is passed by value. This patch adds the functionality to emit
DWARF location expressions using DW_OP(_bit)_piece to express partial
values.

This is implemented by adding a new operation type OpPiece to complex
DIVariables which accepts an offset and a size.

The canonical example would be something like this:

typedef struct { long int a; int b;} S;

int foo(S s) {
     return s.b;
}

which at -O1 is now codegen’d into:

; Function Attrs: nounwind readnone ssp uwtable
define i32 @foo(i64 %s.coerce0, i32 %s.coerce1) #0 {
entry:
 tail call void @llvm.dbg.value(metadata !{i64 %s.coerce0}, i64 0, metadata !20)
 tail call void @llvm.dbg.value(metadata !{i32 %s.coerce1}, i64 0, metadata !21)
 ret i32 %s.coerce1, !dbg !24
}

with this patch we'd emit the following DWARF:

0x00000047:         TAG_formal_parameter [3]  
                  AT_name( "s" )
                  AT_decl_file( "struct.c" )
                  AT_decl_line( 3 )
                  AT_type( {0x00000069} ( S ) )
                  AT_location( 0x00000000
                     0x0000000000000000 - 0x0000000000000008: rdi, piece 0x00000008
                     0x0000000000000000 - 0x0000000000000006: rsi, bit-piece 32 64 
                     0x0000000000000006 - 0x0000000000000008: rax, bit-piece 32 64  )

cheers,
adrian

Diff Detail

There are a very large number of changes, so older changes are hidden. Show Older Changes

Added documentation and fixed typos.

aprantl updated this revision.Feb 14 2014, 7:54 PM

Rebased patch on ToT and addressed review comments. Most importantly this resolves a conflict with r201190, which added DW_OP_piece support for variables occupying only part of a register (essentially the complementary problem).

aprantl updated this revision.Feb 24 2014, 3:01 PM

Updated to also support variables split up by SelectionDAG’s integer type legalization. Thanks iains for providing additional testcases.

Hi Adrian,

This is a lot of great work, thanks!

I think the first question is how this is going to interact with DIBuilder::createComplexVariable? I think it should just be a concatenation of the two things. It looks like you have it as an instead? In general, I think I like the idea that variable locations just have expressions attached to them. And lacking anything else I think dwarf expressions are just fine here.

I've got more comments on the patch itself, but let's get this out of the way first.

-eric

Two high-level comments:

  1. This needs a set of direct SROA tests. I don't think these need to use the C-compiled-to-debug stuff as you can use stub metadata nodes (i've tried this in a couple of other tests and it works fine) and just check that the intrinsics are being adjusted the way you want and re-pointed in the way you want. You *might* need one layer of the debug part, but I generally feel like you can make this testable in a more "normal" IR test kind of way.
  1. Clang format. =] (ok, this is a minor comment)

Detailed questions below. The most important one is trying to understand how you handle un-split cases and iteratively split cases.

lib/Support/GraphWriter.cpp
90 ↗(On Diff #7324)

Uh, huh?

lib/Transforms/Scalar/SROA.cpp
426–429

This seems like a total hack.

It papers over the fact that this is not relevant for a particular store, but for the entire alloca.

Why can't we find this by locating the use of the alloca in the metadata and then the intrinsic using the metadata? I know it's a non-standard edge in the IR, but I was pretty sure it was still *an* edge in the IR somewhere that would let us find this?

690

Bad reformatting here. Generally, I would encourage you to use clang-format.

2236

What happens if the original store was already a split? This will happen a lot with SROA when we iterate on the same alloca.

Also, what about when this *isn't* a split at all? I feel like the first patch should just migrate dbg.declares from old allocas to new allocas when we're rewriting uses but not re-slicing anything.

2239

Note that you would want SliceSize here, as if we split up the store itself, BeginOffset and EndOffset won't reflect that split.

You should add a test case for this as well -- example would be storing the low and high parts of an i64 separately, as well as storing the whole thing.

2241–2244

I'm probably missing something, but it *looks* like this has the real possibility of taking code which has no partial info (no dbg.value calls) and making it suddenly have more info. Is that right? Is that good? Should we be looking for relevant dbg.value calls pertaining to the original alloca and slicing them up rather than just synthesizing our own?

Also, what about memset? memcpy? speculated stores around phis? I assume follow-up patches, but it might be good to document some of that.

3173

Why is it important to thread a single DIBuilder through all of these levels? Why not just create the DIBuilder in the AllocaSliceRewriter?

Or if it is expensive to construct, create it as a member?

aprantl updated this revision.Mar 5 2014, 4:23 PM

On Feb 26, 2014, at 3:06, Chandler Carruth <chandlerc@gmail.com> wrote:

Hi Chandler/Eric/David,

Thanks for your feedback, everyone!
I think I addressed every comment now, below are some notes regarding the non-obvious or controversial changes.

Two high-level comments:

  1. This needs a set of direct SROA tests.

There are now three tests that explicitly run opt -sroa and check the IR output. The IR output is then also compiled with llc to verify the dwarf output, so they are still located in the tests/DebugInfo directory.

Detailed questions below. The most important one is trying to understand how you handle un-split cases and iteratively split cases.
...
What happens if the original store was already a split? This will happen a lot with SROA when we iterate on the same alloca.

This is not an issue because we can create a variable piece from an existing variable piece just fine.

Also, what about when this *isn't* a split at all? I feel like the first patch should just migrate dbg.declares from old allocas to new allocas when we're rewriting uses but not re-slicing anything.

I moved the code that migrates the debug declares from the Store handler to the AllocaSlice visitor of AllocaRewriter. It also handles the case were the alloca is just rewritten without slicing. Currently it’s using the DIType of the DIVariable to determine whether this should be a VariablePiece or not. This is not strictly necessary, though: it forces us to build a DITypeIdentifierMap in SROA, but it saves us 3 int32 per variable MDNode when the alloca is not split. Alternatively we could also just always create pieces, and perform the check whether a piece covers the entire variable back in AsmPrinter.

Comment at: lib/Transforms/Scalar/SROA.cpp:426-429
@@ -419,1 +425,6 @@

+ Make a best effort to find a dbg.declare intrinsic describing
+
the alloca by peeking at the next instruction.
+ if (DbgDeclareInst *DDI=dyn_cast_or_null<DbgDeclareInst>(SI.getNextNode()))
+ S.DbgDeclare = DDI;
+
————————

This seems like a total hack.

It papers over the fact that this is not relevant for a particular store, but for the entire alloca.

Why can't we find this by locating the use of the alloca in the metadata and then the intrinsic using the metadata? > I know it's a non-standard edge in the IR, but I was pretty sure it was still *an* edge in the IR somewhere that would let us find this?

The current implementation uses a DenseMap<AllocaInst, DbgDeclare> to find the debug intrinsic for an alloca.

Comment at: lib/Transforms/Scalar/SROA.cpp:2241-2244
@@ +2240,6 @@
+ if (Var.getType().getSizeInBits() < Size*8 /* Assuming 8 Bits/Byte */) {
+ DIVariable Piece = DIB.createVariablePiece(Var, BeginOffset, Size);
+ DIB.insertDbgValueIntrinsic(V, 0, Piece, Store)
+ ->setDebugLoc(DbgDecl->getDebugLoc());
+ Pass.DeadInsts.insert(DbgDecl);
+ }
————————

I'm probably missing something, but it *looks* like this has the real possibility of taking code which has no partial info (no dbg.value calls) and making it suddenly have more info. Is that right? Is that good?

Maybe I’m misunderstanding what you said there, but as far as I understand it this:

Should we be looking for relevant dbg.value calls pertaining to the original alloca and slicing them up rather than just synthesizing our own?

is describing exactly what we are doing right now. Also note that only dbg.declare instrinsics can describe allocas.

Also, what about memset? memcpy? speculated stores around phis? I assume follow-up patches, but it might be good to document some of that.

This is not yet complete and I plan to add other cases over time!

aprantl updated this revision.Mar 7 2014, 1:26 PM

Here is a rebased version of the patch that applies cleanly after Chandler’s movement of header files and Eric’s creation of AsmPrinterDwarf.

(sorry for delays, trying to get to this one...)

aprantl updated this revision.Mar 18 2014, 5:55 PM

Rebased on r204198.
Moved the DWARF expression output for DW_op_pieces out of emitDebugLocEntry::emitDebugLocEntry() and into a DWARFPieceEmitter. Should be more readable this way.

Hi Eric & Chandler,

it’s been a while, and I was just thinking about what we could do to make this move forward a little faster. I understand that the more time is passing the patch grows larger and large which makes it harder to review. We could, for instance split out the SROA changes into a separate patch, and then focus on the IR and DwarfDebug changes initially, with the type legalizer providing the necessary test cases.

cheers,
adrian

Mostly commenting on the SROA things.

lib/Transforms/Scalar/SROA.cpp
970–972

If these are actually indexed by AllocaInst*s, why not use that as the key type?

975–976

This is a pretty terrible interface... Why not just lazily create this and store it in the Module so that the Module provides a direct method to produce a TypeIdentifierMap?

Also, why Optional rather than a pointer? I assume TypeIdentifierMap is some kind of smart pointer?

2107–2108

I don't think this belongs in the slice rewriter. The rewriter is about rewriting uses in the slice vector.

There is a perfect place for this in the rewritePartition method right after we create the new AllocaInst.

2116–2117

This being an assert worries me. Does the verifier ensure this?

2124

This doesn't make sense to me. We may see the same DbgDecl many times? Wouldn't this become dead only when OldAI became dead?

Let's look into splitting this up and out a bit. I think being able to have complex addresses as part of a location in the debug value instruction should be fine. The multiple optional statements bother me a bit, I guess we'll need to add a null metadata in cases where you want to use this.

Let's split the patch out with just the infrastructure changes needed and a testcase to use them and we'll go from there? In general, I think it's a good idea, just a big patch :)

-eric

Let's look into splitting this up and out a bit. I think being able to have complex addresses as part of a location in the debug value instruction should be fine. The multiple optional statements bother me a bit, I guess we'll need to add a null metadata in cases where you want to use this. What about turning it into a non-optional argument? Just keep a NULL for a complex address? How badly does that affect the existing tests?

Let's split the patch out with just the infrastructure changes needed and a testcase to use them and we'll go from there? In general, I think it's a good idea, just a big patch :)

i'd probably split out the "optimize" as well. Yeah, we'll emit non-optimal for a couple of patches, but as an add-on that seems like a more obvious patch.

I echo Chandler's thoughts about the type uniquing map. What's going on there?

Let's raise this back to everyone's queue again. Had some questions on the type uniquing map, but I think most of the prereqs were already reviewed and so this can be rebased?

aprantl updated this revision to Diff 12121.Aug 1 2014, 4:12 PM

Rebased on today's trunk. Otherwise mostly unchanged so far, I have a couple of questions regarding Chandler's comments that I'd like to clarify before the next iteration.
Thanks for all the feedback so far!

I'm curious why there are both SROA and SelDAG changes in this together - I assume the SelDAG (and any other LLVM CodeGen) changes could be separated/tested with raw IR inputs and the SROA changes should be implemented/tested as standalone IR->IR behavior, right?

So is there more separation to do in this patch, or are these changes somehow inextricably linked?

lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1285 ↗(On Diff #12121)

Remove commented out code (& unrelated whitespace changes) - then this file shouldn't be included in the diff at all.

Ideally this whole change shouldn't have any changes to core debug info APIs/IR APIs - those changes have already been committed. This change is now just about teaching certain optimizations to preserve/create the IR necessary to describe the variable location fragments, right?

dblaikie edited the test plan for this revision. (Show Details)Aug 1 2014, 4:27 PM
dblaikie removed a reviewer: deleted.

Added some questions/comments to Chandler's questions. Thanks!

lib/Transforms/Scalar/SROA.cpp
970–972

They could also be function arguments, not just allocas.

975–976

The main concern was memory usage. I wanted only create this on demand when it's used and free the memory after the pass is done with the module. That said, I'm not overly attached to this model.

Got rid of the optional.

2107–2108

What I'm doing here is replace the one dbg.declare describing the old alloca with one for each of the slices. IF I understand it correctly :-) it appears that rewritePartition would be too early for this because it didn;t yet create the slices yet.

2116–2117

The verifier (DIVariable::Verify()) cannot ensure this, because it would need access to a DITypeIdentifierMap to look up the type. There is, however, the same assertion in DwarfDebug just before the Dwarf expression it would be emitted.

If we *do* make the DITypeIdentifierMap a feature of the module we could have the verifier take care of it.

2124

The goal is to replace the single DbgDecl that describes the old alloca with many smaller dbg instrinsics that describe the individual slices.

aprantl updated this revision to Diff 12311.Aug 8 2014, 1:05 PM

Splitting out TypeLegalizer changes into a separate review.

The type legalizer changes can be found at http://reviews.llvm.org/D4831.

And both of these patches are predicated on http://reviews.llvm.org/D3373 right?

aprantl updated this revision to Diff 12313.Aug 8 2014, 1:56 PM

Removed extra whitespace from patch.

dblaikie edited the test plan for this revision. (Show Details)Aug 20 2014, 7:46 PM
dblaikie added a reviewer: chandlerc.
dblaikie removed a subscriber: chandlerc.

Bumping this hopefully into Chandler's queue (by setting him as a reviewer, rather than just a subscriber) and some minor feedback from me on the test cases.

test/DebugInfo/X86/sroasplit-2.ll
9

Perhaps these two test cases could be named more precisely for what they do - it took me a bit of staring to see the specific difference between them (the type of the Inner::b member is different) and I still don't have any idea why that's important to test separately (what is it that's different about that case)?

(if it is different and different code is needed for it, usual question applies about "is this different code that can be implemented/tested/committed separately", but perhaps it isn't)

& do both cases need the two levels of indirection (a struct, with an array of other structs)? While a test case that covers a few cases in one go is soemtimes nice, it'd help me at least if there was a comment explaining why the different pieces of indirection were important to the test - what parts of SROA they're exercising.

aprantl updated this revision to Diff 15262.Oct 22 2014, 11:18 AM

Rebased on todays trunk.

This patch teaches SROA to split the debug info for aggregate variables into pieces describing the individual scalar elements of the former aggregate.

Chandler, could you please review my responses to you previous review?

aprantl updated this revision to Diff 15263.Oct 22 2014, 11:21 AM

Fix an initializer for MSVC compatibility.

To finish reviewing this, it would help to have a full context patch? If that's a problem, I can patch it locally and take a look...

lib/Transforms/Scalar/SROA.cpp
959

These are of Value *s, but what Values? The loads? The allocas? Something else?

Essentially, I think this comment needs to indicate what the synthetic "use" edge looks like for the purpose of SROA.

2117–2118

The order here seems a bit odd. I would group the references separately from the builders....

Also, nit-picky issue, clang-format please. =]

3724

Bad indentation...

aprantl updated this revision to Diff 17303.Dec 15 2014, 2:10 PM

Rebased on trunk and addressed Chandler's feedback.

Thanks for the full context diff, it did help. =D

lib/Transforms/Scalar/SROA.cpp
2112

You don't need this -- the SROA pass object is available directly as "Pass".

2166–2177

Actually, why do this here at all? This doesn't seem to really be related to rewriting the uses in the traditional sense. Notably, it seems excessive if OldAI == NewAI (which happens in some cases).

I think the better place to put this logic is inside the rewritePartition method of the SROA pass itself right after we create the new alloca.

3713–3714

This should be a dyn_cast to AllocaInst and a re-use.

Also, getAddress can return null which seems like it would crash here. Unless the verifier rejects such an debug declare inst (and if it does, it would be nice to have an "unsafe" version of the method that returns null for the verifier and have this method assert for us that the result is non-null) this should handle the null case gracefully.

Somewhat separately (and apologies if this just is my ignorance of debug info) what cases would you expect where the address isn't directly an alloca inst?

aprantl added inline comments.Dec 18 2014, 1:42 PM
lib/Transforms/Scalar/SROA.cpp
2112

Thanks!

2166–2177

Done.

3713–3714

See the first testcase:

define i32 @foo(%struct.Outer* byval align 8 %outer) #0 {
  call void @llvm.dbg.declare(metadata %struct.Outer* %outer, metadata !25, metadata !2), !dbg !26

it could be an argument.

aprantl updated this revision to Diff 17469.Dec 18 2014, 1:43 PM

Addressed Chandler's feedback / moved the debug intrinsic generation right after we create the new alloca.

aprantl updated this revision to Diff 17470.Dec 18 2014, 1:46 PM

Changed isa/cast to dyn_cast.

aprantl added inline comments.Dec 18 2014, 1:52 PM
lib/Transforms/Scalar/SROA.cpp
3713–3714

No the verifier does not reject it, although an UndefValue would be preferred over of a nullptr. I'll change it to a dyn_cast_or_null.

aprantl updated this revision to Diff 17472.Dec 18 2014, 1:53 PM

s/dyn_cast/dyn_cast_or_null/

The code is looking quite good at this point. I'm mostly looking for some cleanups / clarifications of things I don't understand.

lib/Transforms/Scalar/SROA.cpp
3245–3247

This predicate is somewhat subtle. You're creating a piece expression any time the alloc size isn't the same, but you're using the slice size rather than the alloc size to create the piece.... Can you add a comment here? If I came to this code later, I wouldn't understand why the different sizes used were significant.

3717

Is there a particular cost to constructiog DIBuilder objects? I ask because, were this an IRBuilder, I would have said to just construct it directly in each place we need it, because there is no real cast associated with it. If the same is true for DIBuilder, then I'd suggest the same change. If DIBuilder works differently, I'd like to know how to better use it. =]

test/DebugInfo/X86/sroasplit-1.ll
4

Honestly, I really dislike seeing rdar links even in tests. They add no information for any reader not at Apple, and so you end up needing to explain all of the context of the test anyways. PR links are at least a reasonable thing for your average contributor to go read.

aprantl added inline comments.Dec 22 2014, 12:28 PM
lib/Transforms/Scalar/SROA.cpp
3245–3247

I changed condition and comment to this, which should be more clear:

// Create a piece expression describing the slice, if the new slize is
// smaller than the old alloca or the old alloca already was described
// with a piece. It would be even better to just compare against the size
// of the type described in the debug info, but then we would need to
// build an expensive DIRefMap.
if (SliceSize < DL->getTypeAllocSize(AI.getAllocatedType()) ||
    DbgDeclares[AI].getExpression().isVariablePiece())
  Piece = DIB->createPieceExpression(BeginOffset, SliceSize);
3717

Compared to an IRBuilder it looks as if DIBuilder has to initialize quite a few members (that will never be used). But I'm unsure whether there actually is a cost to initializing DenseMaps and SmallVectors? Otherwise I'd be happy to instantiate the DIBuilder on the fly. Alternatively we could factor out a less capable DIBuilder base class that does not have all these members.

Module &M;
LLVMContext &VMContext;

MDNode *TempEnumTypes;
MDNode *TempRetainTypes;
MDNode *TempSubprograms;
MDNode *TempGVs;
MDNode *TempImportedModules;

Function *DeclareFn;     // llvm.dbg.declare
Function *ValueFn;       // llvm.dbg.value

SmallVector<Metadata *, 4> AllEnumTypes;
/// Track the RetainTypes, since they can be updated later on.
SmallVector<TrackingMDNodeRef, 4> AllRetainTypes;
SmallVector<Metadata *, 4> AllSubprograms;
SmallVector<Metadata *, 4> AllGVs;
SmallVector<TrackingMDNodeRef, 4> AllImportedModules;

/// \brief Track nodes that may be unresolved.
SmallVector<TrackingMDNodeRef, 4> UnresolvedNodes;
bool AllowUnresolvedNodes;

/// Each subprogram's preserved local variables.
DenseMap<MDNode *, std::vector<TrackingMDNodeRef>> PreservedVariables;
test/DebugInfo/X86/sroasplit-1.ll
4

I can understand that. Removed.

aprantl updated this revision to Diff 17568.Dec 22 2014, 12:29 PM

Address previous round of feedback.

LGTM with a minor change below, thanks for all the work explaining things. =]

lib/Transforms/Scalar/SROA.cpp
3717

Interesting.

I suggest you just build them on the fly here, and if these members ever end up being a problem, we can go in and avoid their cost much as you describe. No need to do that until we see a problem, but it seems nicer than having a heap-allocated builder.

aprantl updated this revision to Diff 17571.Dec 22 2014, 1:29 PM

Thanks for the review so far!
This

  • Initializes DIBuilder on the fly.
  • Improves the DebugInfo/X86/array2.ll testcase based on the new capabilities
  • Fixes a compile error in the above-cited expression.
aprantl accepted this revision.Dec 22 2014, 2:23 PM
aprantl added a reviewer: aprantl.

Closing. Thanks everyone!!

This revision is now accepted and ready to land.Dec 22 2014, 2:23 PM
aprantl closed this revision.Aug 18 2015, 10:41 AM