This is an archive of the discontinued LLVM Phabricator instance.

[llvm-dwarfdump] Fix misleading scope byte coverage statistics
ClosedPublic

Authored by Orlando on Aug 10 2020, 4:14 AM.

Details

Summary

Fixes PR46575.

Bump statistics version to 6.

Without this patch, for a variable described with a location list the stat
'sum_all_variables(#bytes in parent scope covered by DW_AT_location)' is
calculated by summing all bytes covered by the location ranges in the list and
capping the result to the number of bytes in the parent scope. With the patch,
only bytes which overlap with the parent DIE scope address ranges contribute to
the stat. A new stat 'sum_all_variables(#bytes in any scope covered by
DW_AT_location)' has been added which displays the total bytes covered when
ignoring scopes.

Diff Detail

Event Timeline

Orlando created this revision.Aug 10 2020, 4:14 AM
Orlando requested review of this revision.Aug 10 2020, 4:14 AM
dstenb added a subscriber: dstenb.Aug 10 2020, 6:57 AM

There are no FileCheck checks for the new metric, as far as I can tell?

I guess it could be good to add a lit test where the two metrics differ.

Orlando updated this revision to Diff 284641.Aug 11 2020, 3:49 AM
Orlando edited the summary of this revision. (Show Details)

Hi @dstenb, thanks for taking a look. I've updated the diff to include a test but it feels a little unwieldy. Is there a better way to test this?

I've also fixed a bug (good idea to request a test...) and renamed the stat to better reflect its purpose.

dblaikie added inline comments.Aug 11 2020, 1:25 PM
llvm/test/tools/llvm-dwarfdump/X86/stats-scope-bytes-covered.s
1–13 ↗(On Diff #284641)

Probably worth simplifying this test - it seems really long if it's only testing this property. (& if possible, including source code/steps to generate it - though probably better/easier to hand-craft DWARF for a case like this, since any case where LLVM produces locations wider than their scope is buggy & if that can't be tickled in a small example, maybe worth some hand-writing/tweaking of the DWARF or IR)

llvm/tools/llvm-dwarfdump/Statistics.cpp
300–305

FWIW, I hope it does emit DWARF like this in the not-too-distant future. (specifically, if the variable's location is described using an entry_value, it might be good to use that entry_value for the entire scope of the function, even if the variable's value is available through other means as well - it'd be a more efficient encoding than having two separate ranges that use entry_value split because there's a range that uses an immediate value in between)

Orlando added inline comments.Aug 12 2020, 10:30 AM
llvm/test/tools/llvm-dwarfdump/X86/stats-scope-bytes-covered.s
1–13 ↗(On Diff #284641)

I agree that the test is clunky, but I'm a bit stumped on what a good test would look like.

I would like to avoid testing from IR/MIR because, as you point out, the situation we are checking for is buggy output from clang. The output is prone to getting fixed and could cause a false failure in the future. That is why I chose to use assembly in the test, but it's not very human friendly, even with a smaller very contrived input MIR to generate the assembly.

yaml2obj seems like a promising solution but the yaml generated by obj2yaml is doesn't convey the purpose of the test either IMO.

So IMO a non-contrived (and more complex) yaml or .s test with the source in the comments is the way forward here, wdyt?

aprantl accepted this revision.Aug 12 2020, 4:25 PM

LGTM modulo David's comment about the test.

This revision is now accepted and ready to land.Aug 12 2020, 4:25 PM
jhenderson added inline comments.
llvm/test/tools/llvm-dwarfdump/X86/stats-scope-bytes-covered.s
1–13 ↗(On Diff #284641)

I looked at this test, and I honestly can't see the wood from the trees. There's a lot of fluff in it that you can delete, as it has zero relevance to what you are testing. A YAML version might make it easier to trim it down further, although the DWARF support for ELF yaml2obj is still undergoing refinement - see the various patches by @Higuoxing in recent months. It might be worth switching to YAML, and then working with @Higuoxing to highlight where improvements could be made to yaml2obj to allow removing much of the useless bits - this is one of the aims of yaml2obj: to allow you to write concise tests that do what you need without things getting too verbose. The same can sometimes be achieved in assembly too, but raw assembly isn't that readable for DWARF, in my opinion.

Having the input code might help, but there's a risk it will go stale in the future.

5–6 ↗(On Diff #284641)

Tip: Something I'm trying to encourage is to use ## for comments in lit tests (or equivalent for other comment markers), because it helps the comments stand out from the RUN and CHECK lines. Indeed, there's some suggestion that this should be adopted as "standard" practice to help FileCheck diagnostics (which for example here might complain about the use of "Check" in the first line not being a real CHECK directive).

Orlando updated this revision to Diff 285627.EditedAug 14 2020, 5:15 AM
Orlando marked 2 inline comments as done.
Orlando edited the summary of this revision. (Show Details)

Thanks @aprantl and @jhenderson.

I'd be happy to write a yaml2obj test. The situation that I want to represent is this (note the var location range and the lexical block high_pc and low_pc):

DW_TAG_compile_unit
  DW_AT_producer    ("clang version 11.0.0")
  DW_AT_language    (DW_LANG_C99)
  DW_AT_name        ("example.c")
  DW_AT_low_pc      (0x0000000000000000)
  DW_AT_high_pc     (0x000000000000000b)

  DW_TAG_subprogram
    DW_AT_low_pc    (0x0000000000000000)
    DW_AT_high_pc   (0x000000000000000b)
    DW_AT_name      ("fun")
    DW_AT_type      (0x00000061 "int")

    DW_TAG_lexical_block
      DW_AT_low_pc  (0x0000000000000005)
      DW_AT_high_pc (0x000000000000000a)

      DW_TAG_variable
        DW_AT_location      (0x00000000: 
           [0x0000000000000000, 0x0000000000000005): DW_OP_reg5 RDI)
        DW_AT_name  ("local")
        DW_AT_type  (0x00000061 "int")

      NULL

    NULL

  DW_TAG_base_type
    DW_AT_name      ("int")
    DW_AT_encoding  (DW_ATE_signed)
    DW_AT_byte_size (0x04)

  NULL

@jhenderson or @Higuoxing, is that something we're able to do with yaml2obj at the moment? Round-tripping my built test file through obj2yaml and yaml2obj (tools built at f5a252ed681) and using llvm-dwarfdump shows:

return.o:	file format elf64-x86-64

.debug_info contents:

In the mean time I've shortened the assembly test and added a comment showing the llvm-dwarfdump output.

I've also changed the new stat name back to 'sum_all_variables(#bytes in any scope covered by DW_AT_location)'. I changed this in the last update to look more like 'sum_all_variables(#bytes in parent scope)', but I misunderstood the meaning of that stat. The most similar stat is "sum_all_variables(#bytes in any scope covered by DW_AT_location)". The name isn't great because both the new stat and this one both include DW_AT_const_value locations AFAICT. I think "sum_all_variables(#bytes in any scope covered by a location description) might be better, but I wanted to keep it in line with the existing stats.

Hi,

Currently, obj2yaml only supports dumping the .debug_aranges section. You might not be able to convert the object file to YAML. I'm very happy to handcraft one for you.

I stripped most of the unused attributes and DIEs. The following YAML can be accepted by yaml2obj. (yaml2obj doesn't support emitting the .debug_loc section, so we have to use raw content to generate it.)
Please let me know if it doesn't work.

It looks like:

DW_TAG_compile_unit
  DW_AT_low_pc      (0x0000000000000000)
  DW_AT_high_pc     (0x000000000000000b)

  DW_TAG_subprogram
    DW_AT_low_pc    (0x0000000000000000)
    DW_AT_high_pc   (0x000000000000000b)

    DW_TAG_lexical_block
      DW_AT_low_pc  (0x0000000000000005)
      DW_AT_high_pc (0x000000000000000a)

      DW_TAG_variable
        DW_AT_location      (0x00000000: 
           [0x0000000000000000, 0x0000000000000005): DW_OP_reg5 RDI)

      NULL

    NULL

  NULL
--- !ELF
FileHeader:
  Class:   ELFCLASS64
  Data:    ELFDATA2LSB
  Type:    ET_EXEC
  Machine: EM_X86_64
Sections:
  - Name:         .debug_loc
    Type:         SHT_PROGBITS
    AddressAlign: 0x01
    Content:      '0000000000000000050000000000000001005500000000000000000000000000000000'
DWARF:
  debug_abbrev:
    - Code:     1
      Tag:      DW_TAG_compile_unit
      Children: DW_CHILDREN_yes
      Attributes:
        - Attribute: DW_AT_low_pc
          Form:      DW_FORM_addr
        - Attribute: DW_AT_high_pc
          Form:      DW_FORM_data4
    - Code:     2
      Tag:      DW_TAG_subprogram
      Children: DW_CHILDREN_yes
      Attributes:
        - Attribute: DW_AT_low_pc
          Form:      DW_FORM_addr
        - Attribute: DW_AT_high_pc
          Form:      DW_FORM_data4
    - Code:     3
      Tag:      DW_TAG_lexical_block
      Children: DW_CHILDREN_yes
      Attributes:
        - Attribute: DW_AT_low_pc
          Form:      DW_FORM_addr
        - Attribute: DW_AT_high_pc
          Form:      DW_FORM_data4
    - Code:     4
      Tag:      DW_TAG_variable
      Children: DW_CHILDREN_no
      Attributes:
        - Attribute: DW_AT_location
          Form:      DW_FORM_sec_offset
  debug_info:
    - Version:    4
      AbbrOffset: 0x00
      Entries:
        - AbbrCode: 1 ## DW_TAG_compile_unit
          Values:
            - Value: 0x00 ## DW_AT_low_pc
            - Value: 0x0b ## DW_AT_high_pc
        - AbbrCode: 2 ## DW_TAG_subprogram
          Values:
            - Value: 0x00 ## DW_AT_low_pc
            - Value: 0x0b ## DW_AT_high_pc
        - AbbrCode: 3 ## DW_TAG_lexical_block
          Values:
            - Value: 0x05 ## DW_AT_low_pc
            - Value: 0x05 ## DW_AT_high_pc
        - AbbrCode: 4 ## DW_TAG_variable
          Values:
            - Value: 0x00 ## DW_AT_sec_offset
        - AbbrCode: 0 ## NULL
        - AbbrCode: 0 ## NULL
        - AbbrCode: 0 ## NULL
Orlando updated this revision to Diff 285662.Aug 14 2020, 8:30 AM
Orlando marked 2 inline comments as done.

Thank you so much @Higuoxing, that's very kind!

The test now uses yaml2obj with the yaml provided by @Higuoxing.

Thanks. The newer test feels much clearer and more maintainable to me. I don't have enough knowledge in this area to provide more feedback on the patch as a whole though.

Thanks @jhenderson. I think this should have also addressed @dblaikie's concerns regarding the test, so I will aim to land this in a couple of days if there are no objections.

dblaikie added inline comments.Aug 19 2020, 12:35 PM
llvm/test/tools/llvm-dwarfdump/X86/stats-scope-bytes-covered.yaml
18–24

Might be worth making this test case a bit more interesting by having /some/ overlap (and some non-overlap) regions.

llvm/tools/llvm-dwarfdump/Statistics.cpp
296–312

Little concerned about the N^2 complexity of this approach, and wondering if an alternative approach might also address the limitation regarding overlapping locations.

Start with a range of intervals (without overlaps) for the enclosing scope. Then intersect any location ranges with that collection of intervals and store those in another non-overlapping interval range.

A small data structure that lets you build a non-overlapping list of intervals (merging intervals, etc), keeping them sorted - then it'll be easier to find other overlapping regions.

eg:

Intervals Scope;
for (address ranges in scope)
  Scope.insert(R);
Intervals Loc;
for (address ranges in location) {
  R' = Scope.intersection(R);
  Loc.insert(R'); 
}
scope bytes covered = Loc.total_coverage/size/something

But, yeah, maybe the implementation of that's more complicated than is justified for this for now.

Orlando added inline comments.Aug 20 2020, 2:57 AM
llvm/tools/llvm-dwarfdump/Statistics.cpp
296–312

Yeah, that does look like a much better solution. If we want to go in that direction I might need to come back to this after some higher priority work (though I'd be happy to do so!).

A middle ground option might be to use IntervalMap. It's not perfect for this case though because it maps intervals to values, and only coalesces intervals if they have the same value. We don't have 'values' to map here - we only care about the intervals themselves. So we'd have to fudge it slightly by giving every interval some constant value (e.g. 0). I'm not really a fan of this idea, but I thought it'd be at least worth bringing up.

dblaikie added inline comments.Aug 20 2020, 3:44 PM
llvm/test/tools/llvm-dwarfdump/X86/stats-scope-bytes-covered.yaml
18–24

Ping on this.

llvm/tools/llvm-dwarfdump/Statistics.cpp
296–312

Yeah, I looked at/considered IntervalMap but I'm not sure it provides an easy way to get the intersection of one interval with any number of intervals in the map, for instance - or intersectionally reassign the value (if it could do that (eg: given a map with [1, 5) -> 0, [7, 10) -> 0 and an interval [3, 8) -> 1, it could produce [1, 3) -> 0, [3, 5) -> 1, [7, 8) -> 1, [8, 10) -> 0, then you could use it to do this work & do a linear scan over the map summing up all ranges that -> 1. Maybe it's viable to add that functionality to IntervalMap, I'm not sure.)

But I'm OK with that being a follow-up/future work.

Higuoxing added inline comments.Aug 20 2020, 8:57 PM
llvm/test/tools/llvm-dwarfdump/X86/stats-scope-bytes-covered.yaml
44–73

Hi, yaml2obj now supports emitting multiple abbrev tables after 290e399f967390648ec7cf53c4b75a883a90b609 and f7ff0ace96db9164dcde232c36cab6519ea4fce8. We should insert an additional key Table: in debug_abbrev:. yaml2obj and obj2yaml are heavily developed recently, if something doesn't work please feel free to ping me, thanks!

Orlando updated this revision to Diff 286981.Aug 21 2020, 2:56 AM
Orlando marked 3 inline comments as done.

+ Rebase
+ Update test for latest yaml2obj with change suggested by @Higuoxing (thank you!).
+ Make the test more interesting, @dblaikie wdyt?

dblaikie accepted this revision.Aug 21 2020, 9:31 AM

Looks good, thanks!

@Orlando Thanks for doing this!

Have you tested llvm/utils/llvm-locstats/llvm-locstats.py (along with its tests)? Do we need to change something there as well?

Otherwise, lgtm!

Hi @djtodoro thanks for taking a look!

Have you tested llvm/utils/llvm-locstats/llvm-locstats.py (along with its tests)?

Yes, all tests pass currently.

Do we need to change something there as well?

We don't need to update llvm-locstats because the stat "#bytes in parent scope covered by DW_AT_location" - which locstats reads - has been updated here. And IMO the new stat "sum_all_variables(#bytes in any scope covered by DW_AT_location)" is not something we need locstats to display because it doesn't affect coverage.

Oops, I didn't include the Phabricator revision line in the commit message.

3ff4d75c9cbffb12c5c690c389e3977ea6811042.

Please close the revision as well.

Orlando closed this revision.Aug 25 2020, 1:57 AM