Page MenuHomePhabricator

[llvm-dwarfdump][locstats] Unify handling of inlined vars with no loc
ClosedPublic

Authored by djtodoro on Feb 4 2021, 8:26 AM.

Details

Summary

The presence or absence of an inline variable (as well as formal parameter) with only an abstract_origin ref (without DW_AT_location) should not change the location coverage.
It means, for both:

DW_TAG_inlined_subroutine
              DW_AT_abstract_origin (0x0000004e "f")
              DW_AT_low_pc  (0x0000000000000010)
              DW_AT_high_pc (0x0000000000000013)
  DW_TAG_formal_parameter
              DW_AT_abstract_origin       (0x0000005a "b")
NULL

and

DW_TAG_inlined_subroutine
            DW_AT_abstract_origin (0x0000004e "f")
            DW_AT_low_pc  (0x0000000000000010)
            DW_AT_high_pc (0x0000000000000013)
NULL

we should report 0% location coverage. If we add DW_AT_location, for both cases the coverage should be improved.

Diff Detail

Event Timeline

djtodoro created this revision.Feb 4 2021, 8:26 AM
djtodoro requested review of this revision.Feb 4 2021, 8:26 AM
djtodoro edited the summary of this revision. (Show Details)Feb 4 2021, 8:53 AM
jmorse added a comment.Feb 4 2021, 9:22 AM

Thanks for the patch -- do I understand correctly that any coverage of an inlined variable will lead to it being removed from the "InlinedVariables" vector? I'm wondering what happens if we have two inlining sites: one where an inlined variable is optimised out, another where the variable has a location. If we're keeping global data and deleting elements from it when coverage is observed, I think that'll make the output dependent on which the inlining sites are encountered.

If a site without coverage is seen before a site with coverage, it'll get a zero-coverage record from InlinedVariables successfully. But if you flip the order, the site with coverage will erase from InlinedVariables, and the site without coverage won't know about the un-covered variable.

I think this patch also relies on the abstract DW_AT_inline copy of the subprogram being encountered before any call sites, I don't know that that's guaranteed. (Fix this is probably hard, hence I uh, avoided looking at this).

llvm/tools/llvm-dwarfdump/Statistics.cpp
30

Nit, this can probably be a SmallVector, right?

573–574

Thanks for the patch -- do I understand correctly that any coverage of an inlined variable will lead to it being removed from the "InlinedVariables" vector? I'm wondering what happens if we have two inlining sites: one where an inlined variable is optimised out, another where the variable has a location. If we're keeping global data and deleting elements from it when coverage is observed, I think that'll make the output dependent on which the inlining sites are encountered.

If a site without coverage is seen before a site with coverage, it'll get a zero-coverage record from InlinedVariables successfully. But if you flip the order, the site with coverage will erase from InlinedVariables, and the site without coverage won't know about the un-covered variable.

Oh, I've forgotten to replace the vector with a map for concrete inlined instances.

I think this patch also relies on the abstract DW_AT_inline copy of the subprogram being encountered before any call sites, I don't know that that's guaranteed. (Fix this is probably hard, hence I uh, avoided looking at this).

Hm....yes... it could be a problem if it hasn't guaranteed... I'll try to think about it.

djtodoro updated this revision to Diff 321695.Feb 5 2021, 3:35 AM
  • Replace the vector representing inline vars with a map
  • Replace std::vector with SmallVector
  • Improve the test with multiple concrete inlined instacies
jmorse added a comment.Feb 5 2021, 4:43 AM

Looking good with a couple of comments and the harder case of the DW_AT_inline copy being out-of-order wanting to be solved.

llvm/test/tools/llvm-dwarfdump/X86/locstats-for-inlined-vars.yaml
37–38

Could I request a third inlining site with no DIE at all for this variable -- that'll ensure we test the behaviour that D94976 is trying to create.

llvm/tools/llvm-dwarfdump/Statistics.cpp
460

This doesn't need to be in the outer scope, does it? IMO better to keep the declaration where it was, as it's only used in that block.

580–582

I get the feeling modifying this vector while iterating over it is vulnerable to re-ordering / re-allocating.

jmorse added a comment.Feb 5 2021, 5:08 AM

Thinking out loud, and motivated by memory pressure: for the per-inlining-site record, that doesn't need be be maintained before or after we visit the inlining site, right? We might be able to get away with keeping a "Current scopes inlined variables" vector on the stack and passing it down through collectStatsRecursive. That'll automagically release the vector when we're done with the scope.

llvm-dwarfdump --statistics already takes a while to examine a stage2reldeb clang binary, it'd be best to avoid allocating more global data than we need.

djtodoro added a comment.EditedFeb 5 2021, 5:56 AM

@jmorse Thanks for the review!

Thinking out loud, and motivated by memory pressure: for the per-inlining-site record, that doesn't need be be maintained before or after we visit the inlining site, right? We might be able to get away with keeping a "Current scopes inlined variables" vector on the stack and passing it down through collectStatsRecursive. That'll automagically release the vector when we're done with the scope.

llvm-dwarfdump --statistics already takes a while to examine a stage2reldeb clang binary, it'd be best to avoid allocating more global data than we need.

It makes sense. I tried that approach, but I had encountered an issue, and suddenly just dropped that way of the implementation. I'll try to implement this that way now, since we have a prototype working.

llvm/test/tools/llvm-dwarfdump/X86/locstats-for-inlined-vars.yaml
37–38

Sure. I'll add it in the next revision.

llvm/tools/llvm-dwarfdump/Statistics.cpp
460

It is being used at the end of the function again.

580–582

Right... It will be removed completely if we keep the vector on the stack.

djtodoro updated this revision to Diff 321751.Feb 5 2021, 6:53 AM
  • Maintain one array of inlined vars per inlined instance by creating it on the stack
  • Improve the test

TODO: Handle the case then DW_TAG_subprogram with DW_AT_inline comes after inlined instancies

djtodoro updated this revision to Diff 322073.Feb 8 2021, 3:48 AM
  • Code refactor
  • Handle the case when a DW_AT_inline copy of subprogram is out of order
    • Add test for that

Looking good; a few comments inline, thanks for pushing forwards.

llvm/tools/llvm-dwarfdump/Statistics.cpp
471–473

Should this be setting InlinedVarsPtr to nullptr? Inlined variables from a higher-up-the-stack inlining site might still be present and pointed to by the InlinedVarsPtr argument.

666–668

Isn't this "removing" twice, once with std::remove, the other with the erase method?

671

Hmmmmm, this isn't going to collect variables that are nested within lexical_block's I guess, and fixing that probably requires more recursion. Ouch.

731

I reckon we'll need to bump the version number for this.

dblaikie added inline comments.Feb 11 2021, 9:59 AM
llvm/tools/llvm-dwarfdump/Statistics.cpp
666–668

Ah the joys of C++. This code is correct and an application of the classic "erase remove" idiom ( https://en.wikipedia.org/wiki/Erase%E2%80%93remove_idiom ).

Though we do have an llmv wrapper that tidies this up a bit, llvm::erase_if and llvm::erase_value

djtodoro added inline comments.Feb 12 2021, 6:43 AM
llvm/tools/llvm-dwarfdump/Statistics.cpp
471–473

it makes sense

506–514

This piece of code needs some changes in order to handle lexical scopes...

666–668

Yeah, it is an idiom for eliminating elements from stl containers.

@dblaikie I think that using of the std:: ones is recommended approach, right? Or do you proposing using of the llvm:: ones?

671

O yeah... inlined functions can also have lexical scopes... I will add dedicated test case and handle it.

731

I thought that only in the case of new metrics we should update this...
Thanks! I will update it.

dblaikie added inline comments.Feb 12 2021, 11:58 AM
llvm/tools/llvm-dwarfdump/Statistics.cpp
666–668

generally we use the llvm ones, that's why we have them - they simplify the API (make it range based instead of iterator pair based, and in this case wrap up the common erase+remove idiom into a single more legible function call (avoiding the confusion @jmorse ran into when reading this code, for instance))

djtodoro added inline comments.Feb 17 2021, 7:38 AM
llvm/tools/llvm-dwarfdump/Statistics.cpp
666–668

OK, it sounds reasonable to me. Thanks.

djtodoro updated this revision to Diff 324312.Feb 17 2021, 8:01 AM
  • Handle nested nodes (added a test for that)
  • Use llvm::erase_value instead of std::erase-remove idiom

llvm-dwarfdump --statistics already takes a while to examine a stage2reldeb clang binary, it'd be best to avoid allocating more global data than we need.

(Out-of-this-concrete-topic) The D96871 is a small optimization of the llvm-dwarfdump --statistics.

jmorse accepted this revision.Feb 19 2021, 2:35 AM

Ship it!

This revision is now accepted and ready to land.Feb 19 2021, 2:35 AM

This patch seems to be causing a slowdown while running dwarfdump --statistics. It looks like more than 50% of the time is spent in collectZeroCovInlinedVars function.

In function collectStatsForObjectFile, variable InlinedFnsToBeProcessed is being created outside the for loop. It looks like the inline functions to be processed are appended and processed multiple times. I verified that by moving creation of that variable into the for loop, the slowdown goes away.

Can you please take a look!

This patch seems to be causing a slowdown while running dwarfdump --statistics. It looks like more than 50% of the time is spent in collectZeroCovInlinedVars function.

In function collectStatsForObjectFile, variable InlinedFnsToBeProcessed is being created outside the for loop. It looks like the inline functions to be processed are appended and processed multiple times. I verified that by moving creation of that variable into the for loop, the slowdown goes away.

Can you please take a look!

Thanks for finding this! It does solve the problem -- do you want me to patch that or you are willing to post it?

After moving the variable InlinedFnsToBeProcessed to inside the for loop, I noticed a difference in two dwarfdump statistics for a quite large binary with fission=yes:

#variables processed by location statistics
#variables with 0% of parent scope covered by DW_AT_location

Since I am not very familiar with this code, I would request if you could investigate and apply the patch.

After moving the variable InlinedFnsToBeProcessed to inside the for loop, I noticed a difference in two dwarfdump statistics for a quite large binary with fission=yes:

#variables processed by location statistics
#variables with 0% of parent scope covered by DW_AT_location

Since I am not very familiar with this code, I would request if you could investigate and apply the patch.

Hmm, looking at this trying to reproduce the behavior makes me think this wouldn't be the right fix & the change in numbers is probably reflective of a bug in the fix - specifically in LTO situations (where the abstract origin of the function may be in a different CU than the one it's referenced from/inlined into).

But there might be a reasonable alternative fix/improvement - pruning the contents of the list once it has been handled. Possibly sorting the list so it's easy to just visit the elements that need to.

After moving the variable InlinedFnsToBeProcessed to inside the for loop, I noticed a difference in two dwarfdump statistics for a quite large binary with fission=yes:

#variables processed by location statistics
#variables with 0% of parent scope covered by DW_AT_location

Since I am not very familiar with this code, I would request if you could investigate and apply the patch.

Hmm, looking at this trying to reproduce the behavior makes me think this wouldn't be the right fix & the change in numbers is probably reflective of a bug in the fix - specifically in LTO situations (where the abstract origin of the function may be in a different CU than the one it's referenced from/inlined into).

But there might be a reasonable alternative fix/improvement - pruning the contents of the list once it has been handled. Possibly sorting the list so it's easy to just visit the elements that need to.

I have taken a look into this and come up with the similar conclusion.
This could be reproduced with the Split DWARF -- when building e.g. GDB project with -gsplit-dwarf option enabled, it triggers the issue. I was struggling to make a simple test case (from a high level) that meets all the requirements, but it is very hard. That is why I've sent the email: https://lists.llvm.org/pipermail/llvm-dev/2021-April/149735.html.

After moving the variable InlinedFnsToBeProcessed to inside the for loop, I noticed a difference in two dwarfdump statistics for a quite large binary with fission=yes:

#variables processed by location statistics
#variables with 0% of parent scope covered by DW_AT_location

Since I am not very familiar with this code, I would request if you could investigate and apply the patch.

Hmm, looking at this trying to reproduce the behavior makes me think this wouldn't be the right fix & the change in numbers is probably reflective of a bug in the fix - specifically in LTO situations (where the abstract origin of the function may be in a different CU than the one it's referenced from/inlined into).

But there might be a reasonable alternative fix/improvement - pruning the contents of the list once it has been handled. Possibly sorting the list so it's easy to just visit the elements that need to.

I have taken a look into this and come up with the similar conclusion.
This could be reproduced with the Split DWARF -- when building e.g. GDB project with -gsplit-dwarf option enabled, it triggers the issue. I was struggling to make a simple test case (from a high level) that meets all the requirements, but it is very hard. That is why I've sent the email: https://lists.llvm.org/pipermail/llvm-dev/2021-April/149735.html.

Ah, hmm - why would this come up in Split DWARF, do you think? Split DWARF doesn't have any cross-CU references (the format has no way to encode them) so I'd expect no DIEs to be added to the InlinedFnsToBeProcessed list?

After moving the variable InlinedFnsToBeProcessed to inside the for loop, I noticed a difference in two dwarfdump statistics for a quite large binary with fission=yes:

#variables processed by location statistics
#variables with 0% of parent scope covered by DW_AT_location

Since I am not very familiar with this code, I would request if you could investigate and apply the patch.

Hmm, looking at this trying to reproduce the behavior makes me think this wouldn't be the right fix & the change in numbers is probably reflective of a bug in the fix - specifically in LTO situations (where the abstract origin of the function may be in a different CU than the one it's referenced from/inlined into).

But there might be a reasonable alternative fix/improvement - pruning the contents of the list once it has been handled. Possibly sorting the list so it's easy to just visit the elements that need to.

I have taken a look into this and come up with the similar conclusion.
This could be reproduced with the Split DWARF -- when building e.g. GDB project with -gsplit-dwarf option enabled, it triggers the issue. I was struggling to make a simple test case (from a high level) that meets all the requirements, but it is very hard. That is why I've sent the email: https://lists.llvm.org/pipermail/llvm-dev/2021-April/149735.html.

Ah, hmm - why would this come up in Split DWARF, do you think? Split DWARF doesn't have any cross-CU references (the format has no way to encode them) so I'd expect no DIEs to be added to the InlinedFnsToBeProcessed list?

The example I had in my mind is a bit different (I haven't looked into cross-cu support yet, but we should have covered it).
Let me first explain two main variables here:

GlobalInlinedFnInfo -- keeps track of DW_TAG_subprogram copies with DW_AT_inline, so when we face a DW_TAG_inlined_subroutine we know for how many variables we have zero % location coverage
InlinedFnsToBeProcessed -- doing recursive approach if we can firsly meet a DW_TAG_inlined_subroutine referencing a DW_TAG_subprogram copy with DW_AT_inline (since DWARF standard does not propose the order), this variable keeps the DW_TAG_inlined_subroutine DIEs that should be processed after we are finished with the dwarf tree

Let us know consider a big project (let it be GDB) compiled with -gsplit-dwarf. Main executable will have debug_info section:

$ llvm-dwarfdump gdb
0x0000000b: DW_TAG_compile_unit
           DW_AT_stmt_list   (0x00000000)
           DW_AT_comp_dir    ("/gdb")
           DW_AT_GNU_pubnames        (true)
           DW_AT_GNU_dwo_name        ("gdb.dwo")
           DW_AT_GNU_dwo_id  (0xb8ed71b1bb823706)
           DW_AT_low_pc      (0x0000000000482370)
           DW_AT_high_pc     (0x000000000048239a)
           DW_AT_GNU_addr_base       (0x00000000)
0x00000030: Compile Unit: length = 0x00000030, format = DWARF32, version = 0x0004, abbr_offset = 0x001a, addr_size = 0x08 (next unit at 0x00000064)

0x0000003b: DW_TAG_compile_unit
           DW_AT_stmt_list   (0x00000069)
           DW_AT_comp_dir    ("/gdb")
           DW_AT_GNU_pubnames        (true)
           DW_AT_GNU_dwo_name        ("amd64-tdep.dwo")
           DW_AT_GNU_dwo_id  (0x97c5a9e4ce1a43bf)
           DW_AT_GNU_ranges_base     (0x00000000)
           DW_AT_low_pc      (0x00000000004823a0)
           DW_AT_high_pc     (0x000000000048e982)
           DW_AT_GNU_addr_base       (0x00000018)
0x00000064: Compile Unit: length = 0x0000002c, format = DWARF32, version = 0x0004, abbr_offset = 0x0037, addr_size = 0x08 (next unit at 0x00000094)

0x0000006f: DW_TAG_compile_unit
....

We will keep filling both of these variables (GlobalInlinedFnInfo and InlinedFnsToBeProcessed) for all the CUs, but there could be a situation where we have inlined_subroutine referencing a subprogram, but we accidently use variables info from subprogram from previous CU with the same DIE offset -- that is why we have different numbers for this case.