This is an archive of the discontinued LLVM Phabricator instance.

Fixed debug info generation for function static variables, typedef, and records
Needs ReviewPublic

Authored by aaboud on Jul 14 2015, 1:47 AM.

Details

Reviewers
dblaikie
Summary

A Fix for Bugs PR19238 & PR24008.
This patch introduces an improvement to "D9758", which did not consider optimized code (especially inline).

Requires D12426 to be committed first.

Changes include:

  1. Allow creating DIEs that have a lexical block scope without attaching them to a context.
  2. Collect these DIEs in information container associated with the lexical scope.
  3. Collect abstract, inline, and concrete lexical block DIEs in information container associated with the lexical scope.
  4. Assure lexical scope contains local DI nodes to have an entry in the abstract function, if such exists.
  5. At "endModule()", move collected local DIEs to the abstract lexical block DIE, if exists. Otherwise, move them to the concrete lexical block.

Even with this patch we still have the following issues (Could be solved in separate following patch/es):

  1. types and local static variables declared in the function scope are still handled as before, i.e. will always be emitted in the concrete function, even when an abstract function exist.
  2. In the inline lexical block there is no mention for the type and local static variable, even not an entry with "abstract_origin" attribute.
  3. The lexical block hierarchy of inline function might defer from one inline site to another, however, the abstract function contains a unification of all these hierarchies.

Diff Detail

Repository
rL LLVM

Event Timeline

aaboud updated this revision to Diff 29652.Jul 14 2015, 1:47 AM
aaboud retitled this revision from to Fixed debug info generation for function static variables, typedef, and records.
aaboud updated this object.
aaboud added a reviewer: dblaikie.
aaboud set the repository for this revision to rL LLVM.
aaboud added subscribers: friss, probinson, llvm-commits.

Any comment on this approach?

dblaikie added inline comments.Jul 24 2015, 12:59 PM
lib/CodeGen/AsmPrinter/DwarfDebug.h
267

I can't quite follow it - why are these two data structures necessary now (but weren't necessary prior to your proposed change)?

aaboud added inline comments.Jul 26 2015, 12:19 AM
lib/CodeGen/AsmPrinter/DwarfDebug.h
267

In my original solution I did not consider optimized code, especially inline functions.
Now as I need to take care of these cases, it led us to the need for handling function static variables for function been inlined.

Notice that in the today implementation, where the function static variable is always associated to the function regardless of the lexical block it is defined in, the function static variable is being defined once even when the function is being inlined.
This means that you will not have access to that variable from the caller function.

With this patch, I am not just solving the static lexical block issue but also assure have access to that static variable from the caller function.

If you think that I should not bother solving this debugging issue, I might be able to do less changes and keep the current behavior (of losing debug info when inline).

What do you think?

aaboud updated this revision to Diff 30718.Jul 27 2015, 12:32 PM
aaboud updated this object.

Removed the handle of abstract/concrete function static variables, following comment from David, and will discuss it in separate thread.

David,
Do you want to comment on the last uploaded patch?

dblaikie edited edge metadata.Aug 10 2015, 9:08 AM

Thanks again for your reminders and patience...

lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
341–342

Could probably just roll the local variable into the for loop header.

343–344

Should we be adding imported entity DIEs in inlined or abstract definitions? Not sure... I suppose maybe.

What does it mean not to add the types and static variables to these cases? (if we inline away all uses of a function - won't we lose the local types and static variables & the user won't be able to print/interact with them?)

352–353

Formatting (run clang-format over your change?)

354–356

Doesn't "getOrCreateTypeDIE" already do the getDIE check? (that would be implied by the "get" part of "get or create" in the name, I would hope)

372–374

I'm somewhat confused about what this code is about/for. Could you help explain it further?

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
743–744

When do we do this while not skipping context?

(also - creating a lexical block that doesn't contain any code may be rather useless - the debugger will never be at a location inside that scope, so it may be difficult to access the variables, etc, there if the debugger implements correct scoping rules... - but I'm not sure. What do other compilers do here?)

test/DebugInfo/X86/lexical-block-inline.ll
21

Rather than using -O2 which changes a bunch of other stuff in the debug info too, it would be preferable to use no optimizations and add attribute((alwaysinline)) (or is it forceinline? I always forget the name) to the function that must be inlined.

Thanks David for your comments. I tried to answer all of them.
Hopefully, the answers are clear.

Also, I agreed to fix some code according to your comments, but I prefer to hear your reply on the other comments before I upload a new patch.

lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
341–342

I will.
I just thought that "findLocalDeclNodesForScope()" function is not cheap, and was not sure that calling it in the "for" loop header is guaranteed to happen once.
If you think that it will not be a performance issue, I will move it to the "for" loop header.

343–344

First of all, notice that we agreed to keep the current behavior of how types/static variables are handled.
Right now, we are emitting them only in the original function (the one that did not get inline, regardless if it has a code or not), the only difference is that we used to emit them always in the function scope, and this patch assure emitting them in the right lexical scope.
We will discuss how to improve that in a separate thread once we are done with this patch.

Imported entries however, are correctly supported (also with inline) and they will always be added to the lexical scope, no matter if it is inline, abstract or regular scope. Thus, there is no need to change that.
If you ask me, once we handle the abstract/concrete static variable, we will end up emitting them too in all kind of lexical scopes.

352–353

I will do before I upload a new patch.

354–356

True, however, if the getDIE in the getOrCreateTypeDIE returns an entry we will not know that and will end up adding this entry twice to the parent scope. This could happen, because types might be created bottom-up and added to the lexical block before handling the lexical scope itself (top-down). This is not the case for static variables, they will never be created bottom-up, that is why we have only an assertion there.
This is why we need to check "getDIE" for types separately before calling "getOrCreateTypeDIE()" function.

If we can differ creating types till all lexical scopes are created, this will prevent the above case for types, but it is not trivial thing to do.

Another thing, which I thought to discuss later with the abstract/concrete static variables, is to emit the type in all kind of lexical scope (inline, abstract, regular). However, we need to discuss it separately as type entry might not be small like the imported entry.

372–374

Let's start from the lines that was not changed (385-386), these are clear, we are creating a new lexical scope.
And if we know that it is a regular lexical scope, we want to add it to the scope map (387-388).

However, because now we are adding regular lexical scope to the map, we want to make sure we do not have this lexical block already created.
It could be created already in case we created a type that belongs to this lexical scope (or to one of its children scopes), as I said before types are not differed till all lexical block are created!
So, we want to check in the map for the lexical scope (379) but only for regular scope (378).
And if we find the lexical scope in the map, this means that it was already added to its parent scope (380-383), so we mark that flag to prevent adding it twice to the parent scope (396-397).

When we discuss the abstract/concrete static variables and differing type creation, we might find a way to prevent this.
But for now, we will need this code.

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
743–744

We do that because for some reason LLVM decide to emit debug info for functions that has no code (see the test PR24008.ll below).
even though the static member function "Run()" has no code and even when it was inline there is no instructions related to it, LLVM decided at endModule to create entry for this function, but it is a method, so it end up creating the class Bar type, which happens to be defined inside a lexical block that has no code!

If it was not a type, I would agree with you and say we should not be doing that, but type entries are needed to complete other entries, and we should emit them even if the parent scope of these types has no code.

GCC does that too, in his own way with his own gaps (if you ask me)...I still did not see a compiler that does this perfectly, hopefully we can eventually make LLVM do that the best.

test/DebugInfo/X86/lexical-block-inline.ll
21

I am not sure if this will work.
But I can run clang with -O0, then run inline/always-inline pass using opt if this sounds better to get less noise from other passes.
I will then comment that in the test description.

aaboud added a subscriber: aaboud.Aug 18 2015, 1:01 AM

David,
Do you agree with my answers to your comments?
Some of the answers will need a confirmation or a disagreement.

Thanks,
Amjad

dblaikie added inline comments.Aug 20 2015, 1:31 PM
test/DebugInfo/X86/PR24008.ll
17

What are the important features of this test? Why are they important? It might be helpful to comment them.

At the moment this looks like it could be simplified down to:

void f1() {
  while (true) {
    struct nested {
      static void f2() {}
    };
    nested::f2();
  }
}

Though even then, I'm not sure what the tricky part of this test is. You've tested nested types in other test cases - was there something in the code you had to write specifically to support this nested type compared to the other tests?

(is there something in your patch I could remove that would cause this test to fail, but not the other tests?)

David, thanks for the feedback.
I quoted your comments as it did not appear by itself, and answered your questions.
I hope that we are converging :)
Let me know if you agree with the below answers.
In the meanwhile, I will update the code with the few changes we already agreed on.

lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp
341–342

Ah, no worries. See the code sample in the "Explanation" section here: http://en.cppreference.com/w/cpp/language/range-for - that shows pretty much exactly the code the C++ range-for loop translates to.

343–344

"regardless of if it has a code or not" - I'm not quite sure what you mean by this, but I'll take a stab:

There are up to 3 different "functions" that we end up producing in the debug info:

  1. the abstract definition
  2. the concrete definition
  3. the inline definitions

If any inlining happens, we always produce (1) and (3), and if not all call sites were inlined, we produce (2) as well.

You're saying that currently we only emit local types and local static variables into (2) (thus if all calls are inlined, and (2) is not emitted, we do not emit these types and static variables at all? What do we do if there's a variable of that type in the inlined code? wow, we produce a weird little pseudo-(2) that just has the nested type in it... huh)

You got it right.
I meant that we will end up creating the concrete definition (2) "regardless of if it has a code or not".
This is because the function contains a local static variable, or alternatively it has a type declared inside it and used in one of the local variables in the inline version (3).
The perfect implementation would be to have both, the local static variable and the type declaration, defined in the inline (or abstract) version of the function. This could mean duplication of the debug info entries, but at least we will not lose debug info.

Do you have a test case for a situation like that? (maybe it's not relevant to your patch - if it's just existing weird behavior, but if you had to write code to preserve it, we should test it - even if it's not the desirable end state (& maybe it is the right end state, I'm not sure)) Here's the test case I was just playing with, I'm not sure it adds anything compared to your existing tests:

void f1(void*)
inline __attribute__((always_inline)) void f2() {
  struct foo {} f;
  f1(&f);
}
void f3() {
  f2();
}

In fact, the below tests cover all the corner cases I am discussing with you.
The case with the concrete function that has no code but still got an entry in the debug info to house the local static variable and the type declaration is covered by test "lexical-block-inline.ll", I just thought that we should not add the "CHECK" lines for the result, as we know that it is not how it should be.

If I understand you correctly, you are suggesting that I do add the "CHECK" lines, and once we fix this in the future, we can modify the test accordingly, correct?

354–356

When does this happen (that the type is created before the scope it is in)? Does one of your tests cover this (I guess the empty scope case in the PR24008.ll test exercises this - the scope is not created (because it is empty when the function's debug info is generated) but then later on we try to build the local type (to reference from the definition of Run()) and fail because there's no context to put it in?)?

Hmm, PR24008.ll exercises the case where the scope is built (& possibly non-existent) before the type, but not the other way around, which seems to be the case here. That the scope is created before we are generating code for this function and its scopes - what sort of source code leads to that situation? (I can think of some, but not sure if they're tested here, etc)

Actually, I thought that "lexical-block.ll" covers this, but it is not, so here is a test that does, (we can modify "lexical-block.ll" to do that too):

1.int foo(void) { 
2.  {
3.    typedef int Y;
4.    static Y c;
5.    return c;
6.  }
7.}

Let's take a look on lexical block starts at line 2. when we visit lexical block <2> we do the following:
a. Check for any imported entry, local static variable, or type and add them to the children list.
b. Create all its other children (the order between (a.) & (b.) is important to reduce the number of cases where the type can be created before the scope, but it does not solve all the cases)
c. Create the lexical block.

But when we check for local static variables and types in the "LocalDeclNodes" list, we could handle the local static variable "c" before the type "typedef Y" (order is not guaranteed, in fact the current order will assure "c" appears before "Y").
Once we create local static variable "c" we also creates its type, ending up with creating "Y" and adding it to the map before revisiting it later when we hit it in the "LocalDeclNodes" list.

Another example would be when we have inline function, but also the concrete function.
Notice that type will always be created in the concrete function, so if we end up creating the inline function first, when we get to the concrete function, the type was already created before the scope.

372–374

OK, so I think this & related parts of the change are still a source of confusion (for me) and complexity that I'm a bit concerned about.

It sounds like you're suggesting that scopes can be created both before a function is being processed (in the case of a nested type being needed prematurely - such as if that type is referenced by a function template, or the definition of one of its members, etc), and after (the same situation, but where the reference comes after the function and the scope had no code in it to begin with) - is that the case? Do you have clear test cases demonstrating both these situations?

Is that what causes all this complexity with regards to scope creation and access? Are there other weird corner cases? Can we simplify this at all?

I think you got it right, all the complexity is due to these both cases (1) where we might create the lexical scope before processing the function and (2) when we skip creating the lexical scope during processing the function but end up creating it later.
Examples:
(1) See the previous comment, I add an example there for this case, we can modify "lexical-block.ll" to cover this.
(2) "PR24008.ll" cover this case, when we process "Foo::TestBody()" function we skip lexical block "do {" because it has no code, but later we end up creating debug info entry for "Bar::Run()" function, which is a method of class "Bar" that is declared at lexical block "do {" so we end up creating the lexical block after we done processing the function.
By the way, we create entry for "Bar::Run() because of this code (Lazily construct the subprogram if we didn't see either concrete or inlined versions during codegen):

void DwarfCompileUnit::finishSubprogramDefinition(const DISubprogram *SP) {
  DIE *D = getDIE(SP);
  if (DIE *AbsSPDIE = DU->getAbstractSPDies().lookup(SP)) {
    if (D)
      // If this subprogram has an abstract definition, reference that
      addDIEEntry(*D, dwarf::DW_AT_abstract_origin, *AbsSPDIE);
  } else {
    if (!D && !includeMinimalInlineScopes())
      // Lazily construct the subprogram if we didn't see either concrete or
      // inlined versions during codegen. (except in -gmlt ^ where we want
      // to omit these entirely)
      D = getOrCreateSubprogramDIE(SP);
    if (D)
      // And attach the attributes
      applySubprogramAttributesToDefinition(SP, *D);
  }
}

Why doesn't the code end up looking more like:

/* find all the things for the scope */

if there are things
  getOrCreateScope
  add things to scope

It seems there are many more variants to this...

If it will be more clear I can move the code to "getOrCreateLexicalScope()" function, but this function will have to return both the lexical scope and an indicator if it was created or not.

(do we create the local scope chain when we create the local type before the outer function? That seems difficult... (because we might not want the intermediate scopes - they may be empty))

With the current implementation, we might end up with empty intermediate scopes when we creates local type before outer function. However, as you realized on your next comment, if we collapse these empty scopes, we might interfere with other names present there.

lib/CodeGen/AsmPrinter/DwarfUnit.cpp
743–744

Yes, that seems true - and we can't just remove their parent scope & collapse them into their grandeparent's scope (that would possibly shadow/interfere with other names present there).

I think it's quite right/the only thing to do, to emit them into an "empty" scope like that, when the type is needed.

It'd be interesting to talk about what /is/ perfect here. I see a lot of tradeoffs, but nothing terribly elegant... but perhaps there's an option

test/DebugInfo/X86/PR24008.ll
17

This test checks "when we skip creating the lexical scope during processing the function but end up creating it later" [see previous comments @line 389] , it is not the same like the other one.

However, you might be right about the simple test above.
I will verify that it covers the same case, and if it does, I will simplify the test using your example.

test/DebugInfo/X86/lexical-block-inline.ll
21

Certainly worth a shot. I've usually done this with all my inlining tests, if it's just inlining you need - it's a little harder to make an empty scope, but not impossible (force inline an empty function as the only code in the scope, I suppose). Even if you have to run optimizations to make a scope empty, probably still worth having the inline attributes to make it clear what's necessary to be inlined, etc.

aaboud updated this revision to Diff 33328.Aug 27 2015, 8:32 AM
aaboud updated this object.
aaboud edited edge metadata.

Improved the solution after discussion with David.
Please, take a look on the new description (Summary).

It'd be good to split this up as much as possible - it'll make review & commit history easier, etc.

Could you create a separate review just for the "don't omit declarations for unused (fully inlined and optimized away) functions" part that we discussed & you've included here?

After that I think, if I recall our conversation correctly, it might be handy to look at delaying the adding of static variables and local types until the end of the module - and put them in the abstract origin when there is one (rather than always putting them in the concrete definition - and losing them if there is no concrete definition? (that's the current behavior, yes?)).

Then after that, look at trying to keep track of the right scopes, the minimal scopes, etc?

It'd be good to split this up as much as possible - it'll make review & commit history easier, etc.

Could you create a separate review just for the "don't omit declarations for unused (fully inlined and optimized away) functions" part that we discussed & you've included here?

Yes, it will make sense to commit the " don't emit declarations for unused (fully inlined and optimized away) functions" separately, but before the other changes.
I will move this part out of this review and open a new one asap.

After that I think, if I recall our conversation correctly, it might be handy to look at delaying the adding of static variables and local types until the end of the module - and put them in the abstract origin when there is one (rather than always putting them in the concrete definition - and losing them if there is no concrete definition? (that's the current behavior, yes?)).

As you agreed, now the local DIEs (i.e. local static variable and local types) are created without being attached to a context and only in at "endModule()" function we call the "finishLexicalBlockDefinitions()" that will attach these DIEs either to the abstract lexical block (if available), or to the concrete lexical block.

Then after that, look at trying to keep track of the right scopes, the minimal scopes, etc?

The minimal scope is guaranteed as we still create lexical block the same way we did before, so there is no empty lexical block created.
However, you might see empty lexical block in the inline function, but that is only because in the equivalent lexical block in the abstract function there is a DIE.
After committing this patch, we might want to add a DIE to the inline lexical block with the "abstract_origin" attribute pointing to the DIE in the abstract lexical block.

Please, read the new summary, I tried to explain the current design/implementation plus what we might want to do as next step.

Right, what I'm suggesting is that part could be done separately from the scope handling stuff.

I understand the benefit of splitting code into as much patches as possible.

This change would ensure that the DIEs go in the right function (the abstract one or the concrete one) - which would fix bugs that occur when there is no concrete DIE, if I understand correctly?

Actually, what we agreed on that the DIEs would go to abstract one if it exists, otherwise to the concrete. So if we have abstract and concrete the DIE will be only in the abstract.
And this is what I implemented above.

For example, currently if you have a static local variable in a function that is inlined we produce an abstract definition, an inlined subroutine, but then we also produce this weird stub-concrete definition (a subprogram tag with /only/ an abstract_origin attribute, and the child DW_TAG_variable for the static variable). If we delay the addition of the local types and static variables and just put them in the abstract definition if there is one, we shouldn't end up producing the weird stub concrete definition (that's not really a concrete definition anyway).

But the “currently” you are mentioning is not true anymore, once we changed the design to create the DIE without attaching it to a scope, the weird stub-concrete definition is not created anymore.
So, unless we handle these DIEs at end of module processing and put them in the correct scope (abstract or concrete), we will end up having flying (unattached) DIEs.

So that would be a good, separable, incremental step. /then/ we can have a separate (3rd) patch or patches to try to put it in the right scopes?

We improved the design, there is no reason to go one step back, just to commit two patches that none of them make sense as itself.

aaboud updated this revision to Diff 33398.Aug 27 2015, 11:44 PM
aaboud updated this object.

Moved out "omitting debug info for function with no code" to separate patch (D12426).
Reverted format fixes on unrelated code.