This is an archive of the discontinued LLVM Phabricator instance.

[llvm-dwarfdump][Statistics] Unify coverage statistic computation
ClosedPublic

Authored by krisb on Nov 21 2019, 10:05 AM.

Details

Summary

The patch removes OffsetToFirstDefinition in the 'scope bytes total' statistic computation. Thus it unifies the way the scope and the coverage buckets are computed. The rationals behind that are the following:

  1. OffsetToFirstDefinition was used to calculate the variable's life range. However, there is no simple way to do it accurately, so the scope calculated this way might be misleading. See D69027 for more details on the subject.
  2. Both 'scope bytes total' and coverage buckets seem to be intended to represent the same data in different ways. Otherwise, the statistics might be controversial and confusing.

Note that the approach gives up a thorough evaluation of debug information completeness (i.e. coverage buckets by themselves doesn't tell how good the debug information is). Only changes in coverage over time make a 'physical' sense.

Event Timeline

krisb created this revision.Nov 21 2019, 10:05 AM
Herald added a project: Restricted Project. · View Herald TranscriptNov 21 2019, 10:05 AM
krisb edited the summary of this revision. (Show Details)Nov 21 2019, 11:38 PM
krisb added a project: debug-info.

@krisb Thanks for working on this!

Let me summarize this quickly.

This will remove mixing between calculation against enclosing and adjusted scope. We have concluded the adjustment heuristics are confusing and wrong in many cases, so we are giving up on that. The alternative is to generate the DW_AT_start_scope attribute, that represents exactly the adjusted scope, and having that we will be able to calculate the debug location statistics for the pieces of code where the variable is alive.

WDYT how "easy" will be to implement the DW_AT_start_scope attribute? I know that it might not be used anywhere else, but I think that having location statistics against variable's life will be very useful thing.

Please refactor the llvm/utils/llvm-locstats/llvm-locstats.py as well (there are variables containing *FromFirstDefinition).

cmtice added a subscriber: cmtice.Nov 22 2019, 9:25 AM

@krisb Thanks for working on this!

Let me summarize this quickly.

This will remove mixing between calculation against enclosing and adjusted scope. We have concluded the adjustment heuristics are confusing and wrong in many cases, so we are giving up on that. The alternative is to generate the DW_AT_start_scope attribute, that represents exactly the adjusted scope, and having that we will be able to calculate the debug location statistics for the pieces of code where the variable is alive.

WDYT how "easy" will be to implement the DW_AT_start_scope attribute? I know that it might not be used anywhere else, but I think that having location statistics against variable's life will be very useful thing.

See my reply from D69027 - though thinking more closely, I'm not sure start_scope would really answer the question as it seems like it has many of the same issues as the adjusted scope we're removing - it assumes the code is laid out in order. Any use of DW_AT_ranges would invalidate it (& I /think/ many & perhaps most scopes are fragmented into ranges in optimized code which is where any of these stats are interesting) & basic block reordering, etc, is sort of the nail in the coffin.

"Yeah, that could be handy - but probably pretty complicated to propagate through the compiler & only have limited benefits to users (in normal debugger scenarios, rather than stats) - the difference between "this variable is out of scope" (or using another variable that would otherwise have been shadowed - that's pretty important) and "this variable's value is optimized out". The complexity of the implementation might be why it's not been implemented - possible that an implementation might be justifiable in a context of bigger improvements like is_stmt accuracy or sub-expression source range grouping, or other ideas that've been thrown around from time to time."

krisb updated this revision to Diff 230796.Nov 24 2019, 9:23 AM

Refactored llvm/utils/llvm-locstats/llvm-locstats.py variable's names to avoid mentioning 'first def'.

krisb added a comment.Nov 24 2019, 9:38 AM

@krisb Thanks for working on this!

Let me summarize this quickly.

This will remove mixing between calculation against enclosing and adjusted scope. We have concluded the adjustment heuristics are confusing and wrong in many cases, so we are giving up on that. The alternative is to generate the DW_AT_start_scope attribute, that represents exactly the adjusted scope, and having that we will be able to calculate the debug location statistics for the pieces of code where the variable is alive.

Thanks for summarizing this! This is perfectly correct.

WDYT how "easy" will be to implement the DW_AT_start_scope attribute? I know that it might not be used anywhere else, but I think that having location statistics against variable's life will be very useful thing.

I haven't investigated the subject in detail, but my initial guess is that life range may be computed as a union of basic blocks or their parts that 1) belong to the enclosing scope and 2) have debug value instructions that describe the variable (either with a valid location or with an undef). It looks easy to implement, but for practical use, it's probably too rough. I'm going to take another look a bit later.

Please refactor the llvm/utils/llvm-locstats/llvm-locstats.py as well (there are variables containing *FromFirstDefinition).

Thanks! Fixed.

@krisb Thanks for working on this!

Let me summarize this quickly.

This will remove mixing between calculation against enclosing and adjusted scope. We have concluded the adjustment heuristics are confusing and wrong in many cases, so we are giving up on that. The alternative is to generate the DW_AT_start_scope attribute, that represents exactly the adjusted scope, and having that we will be able to calculate the debug location statistics for the pieces of code where the variable is alive.

WDYT how "easy" will be to implement the DW_AT_start_scope attribute? I know that it might not be used anywhere else, but I think that having location statistics against variable's life will be very useful thing.

See my reply from D69027 - though thinking more closely, I'm not sure start_scope would really answer the question as it seems like it has many of the same issues as the adjusted scope we're removing - it assumes the code is laid out in order. Any use of DW_AT_ranges would invalidate it (& I /think/ many & perhaps most scopes are fragmented into ranges in optimized code which is where any of these stats are interesting) & basic block reordering, etc, is sort of the nail in the coffin.

The spec looks a bit confusing here, but if I understand it correctly, it should be possible to represent 'adjusted' scope precisely in all the situations.
Regarding DW_AT_start_scope the spec (dwarfv5) says (3.9 Declarations with Reduced Scope) :

Any debugging information entry for a declaration (including objects, subprograms, types and modules) whose scope has an address range that is a subset of the address range for the lexical scope most closely enclosing the declared entity may have a DW_AT_start_scope attribute to specify that reduced range of addresses.
There are two cases:

  1. If the address range for the scope of the entry includes all of addresses for the containing scope except for a contiguous sequence of bytes at the beginning of the address range for the containing scope, then the address is specified using a value of class constant.

...

  1. Otherwise, the set of addresses for the scope of the entity is specified using a value of class rnglistsptr. This value indicates the beginning of a range list.

I understand this as 'If it's possible to represent a reduced scope as an offset to an enclosing lexical scope, it is actually the offset. But in cases when there are no ways to correctly use offset here, it is described in a ranges list, which represents the reduced scope itself'. The latter should allow handling fragmented scopes, blocks reordering, ..., etc.

avl added a comment.Nov 24 2019, 10:08 PM

@krisb the patch looks good to me. though I would prefer if someone from debuginfo project would approve it. Thank you.

@krisb Thanks for working on this!

Let me summarize this quickly.

This will remove mixing between calculation against enclosing and adjusted scope. We have concluded the adjustment heuristics are confusing and wrong in many cases, so we are giving up on that. The alternative is to generate the DW_AT_start_scope attribute, that represents exactly the adjusted scope, and having that we will be able to calculate the debug location statistics for the pieces of code where the variable is alive.

WDYT how "easy" will be to implement the DW_AT_start_scope attribute? I know that it might not be used anywhere else, but I think that having location statistics against variable's life will be very useful thing.

See my reply from D69027 - though thinking more closely, I'm not sure start_scope would really answer the question as it seems like it has many of the same issues as the adjusted scope we're removing - it assumes the code is laid out in order. Any use of DW_AT_ranges would invalidate it (& I /think/ many & perhaps most scopes are fragmented into ranges in optimized code which is where any of these stats are interesting) & basic block reordering, etc, is sort of the nail in the coffin.

"Yeah, that could be handy - but probably pretty complicated to propagate through the compiler & only have limited benefits to users (in normal debugger scenarios, rather than stats) - the difference between "this variable is out of scope" (or using another variable that would otherwise have been shadowed - that's pretty important) and "this variable's value is optimized out". The complexity of the implementation might be why it's not been implemented - possible that an implementation might be justifiable in a context of bigger improvements like is_stmt accuracy or sub-expression source range grouping, or other ideas that've been thrown around from time to time."

@dblaikie I see... There are many things to handle in order to make such stats valid.

@krisb Thanks for working on this!

Let me summarize this quickly.

This will remove mixing between calculation against enclosing and adjusted scope. We have concluded the adjustment heuristics are confusing and wrong in many cases, so we are giving up on that. The alternative is to generate the DW_AT_start_scope attribute, that represents exactly the adjusted scope, and having that we will be able to calculate the debug location statistics for the pieces of code where the variable is alive.

Thanks for summarizing this! This is perfectly correct.

WDYT how "easy" will be to implement the DW_AT_start_scope attribute? I know that it might not be used anywhere else, but I think that having location statistics against variable's life will be very useful thing.

I haven't investigated the subject in detail, but my initial guess is that life range may be computed as a union of basic blocks or their parts that 1) belong to the enclosing scope and 2) have debug value instructions that describe the variable (either with a valid location or with an undef). It looks easy to implement, but for practical use, it's probably too rough. I'm going to take another look a bit later.

Please refactor the llvm/utils/llvm-locstats/llvm-locstats.py as well (there are variables containing *FromFirstDefinition).

Thanks! Fixed.

@krisb Thanks for addressing the comments and considering the improvements!

@aprantl - seems mostly in your wheelhouse? (or is there someone else who owns these statistics more?)

aprantl accepted this revision.Dec 3 2019, 9:36 AM
This revision is now accepted and ready to land.Dec 3 2019, 9:36 AM
This revision was automatically updated to reflect the committed changes.
krisb added a comment.Dec 8 2019, 4:54 AM

Thank everyone for reviewing this!
Committed as 68f464ac2ef5de8cb2e8beaeee43c435c536539e.