This is an archive of the discontinued LLVM Phabricator instance.

[llvm-dwarfdump] Add support for filtering by DIE tags
Needs ReviewPublic

Authored by woodruffw on Nov 8 2021, 9:58 PM.

Details

Reviewers
jhenderson
Summary

Teaches llvm-dwarfdump the --tag=<tag> option, which allows
for filtering of DIEs by their DW_TAG_ values in a manner similar
to the current --name=<pattern> option.

--tag=<tag> can be composed with --name=<pattern>, allowing
uses like discovering every subprogram DIE that contains the
pattern Vector:

llvm-dwarfdump --tag subprogram --name Vector --regex some-binary

Diff Detail

Event Timeline

woodruffw created this revision.Nov 8 2021, 9:58 PM
woodruffw requested review of this revision.Nov 8 2021, 9:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 8 2021, 9:58 PM

I'll add a test as well, just wanted to get an initial version up for feedback.

Any particular motivation/need for this?

I could see myself using this if I were to revisit some of my DWARF work before: in the past I've written scripts to search llvm-dwarfdump .debug_info output, looking for specific attributes of certain properties. For example, for the fragmented DWARF topic I looked at a little while ago, I searched debug output to find specific tags that needed splitting out from the rest of the DWARF. The script was quite slow because it had to trawl through a lot of debug data, even though most of it was irrelevant to the task at hand. Filtering down to specific tags would have sped this process up.

That being said, I'm also curious to know what the use-case is.

Don't forget to add the new option to llvm-dwarfdump's .rst file, when updating.

woodruffw added a comment.EditedNov 9 2021, 7:08 AM

Any particular motivation/need for this?

Yep. I work on a source and binary static analysis system that includes a DWARF information component. Our input programs can be extremely large (we effectively do LTO to bring everything into a single llvm::Module) and sometimes contain hundreds of identically named local variables, globals, formal parameters, etc. When debugging the DWARF information that LLVM emits, we use llvm-dwarfdump to filter down to the problematic DIEs. But this can still leave hundreds of DIEs, particularly when the problematic name is something generic like len. Having the --tag option would allow us to additionally filter by the tag we're interested in, e.g. including only DW_TAG_formal_parameters that match DW_TAG_name=len.

I think the change could be simplified a bit - renaming "filterByName" to just "filter" and having it do both name and tag filtering together, in a single pass?

woodruffw updated this revision to Diff 386009.Nov 9 2021, 4:39 PM

Rewrote to use just a single filter function.

I was thinking of something simpler/less invasive (no need to add a new data structure to collect all the DIEs, etc).

ie: What if "filterByName" functions were renamed to "filter" and each took an extra argument, Tags, & did all the same stufff but also checked Tags?

I was thinking of something simpler/less invasive (no need to add a new data structure to collect all the DIEs, etc).

ie: What if "filterByName" functions were renamed to "filter" and each took an extra argument, Tags, & did all the same stufff but also checked Tags?

I can do this, but I want to offer a bit of justification for the new structure first :-)

Maintainability-wise, I think the new approach is slightly easier to extend with additional filters (e.g. a --type filter for matching DW_TAG_type): they can either be added to dieFilter or, with my previous changeset, composed inside a lambda. The control flow is also a little easier to follow and makes use of make_filter_range, vs. calling through an overloaded filterByName with duplicate callsites for Die.dump.

Performance-wise, pre-collecting all of the DIEs is no slower than the previous approach. I've pasted some local benchmarks below; the test file is a debug build of llvm-dwarfdump, containing approximately 150MB of debug info. My changes are *slightly* faster, but that's probably noise/because the pre-collection simplifies the instructions needed to fetch each DIE.

Before this changeset:

Performance counter stats for './bin/llvm-dwarfdump --name Vector --regex /home/william/tmp/llvm-dwarfdump' (100 runs):

         6,587.02 msec task-clock                #    0.999 CPUs utilized            ( +-  0.06% )
              936      context-switches          #    0.142 K/sec                    ( +-  3.60% )
               61      cpu-migrations            #    0.009 K/sec                    ( +-  2.65% )
           89,214      page-faults               #    0.014 M/sec                    ( +-  0.00% )
   26,887,760,726      cycles                    #    4.082 GHz                      ( +-  0.04% )  (62.46%)
      475,358,555      stalled-cycles-frontend   #    1.77% frontend cycles idle     ( +-  0.59% )  (62.48%)
    6,463,934,156      stalled-cycles-backend    #   24.04% backend cycles idle      ( +-  0.16% )  (62.50%)
   79,870,809,617      instructions              #    2.97  insn per cycle
                                                 #    0.08  stalled cycles per insn  ( +-  0.01% )  (62.52%)
   16,098,031,769      branches                  # 2443.901 M/sec                    ( +-  0.01% )  (62.54%)
       53,414,087      branch-misses             #    0.33% of all branches          ( +-  0.24% )  (62.54%)
   26,475,961,150      L1-dcache-loads           # 4019.413 M/sec                    ( +-  0.01% )  (62.50%)
       99,919,546      L1-dcache-load-misses     #    0.38% of all L1-dcache accesses  ( +-  3.81% )  (62.47%)
  <not supported>      LLC-loads
  <not supported>      LLC-load-misses

          6.59507 +- 0.00363 seconds time elapsed  ( +-  0.05% )

and with DIE pre-collection:

Performance counter stats for './bin/llvm-dwarfdump --name Vector --regex /home/william/tmp/llvm-dwarfdump' (100 runs):

         6,585.08 msec task-clock                #    0.999 CPUs utilized            ( +-  0.06% )
              842      context-switches          #    0.128 K/sec                    ( +-  2.58% )
               69      cpu-migrations            #    0.010 K/sec                    ( +-  2.59% )
          128,934      page-faults               #    0.020 M/sec                    ( +-  0.00% )
   26,807,844,451      cycles                    #    4.071 GHz                      ( +-  0.05% )  (62.46%)
      472,800,149      stalled-cycles-frontend   #    1.76% frontend cycles idle     ( +-  0.70% )  (62.49%)
    4,915,697,574      stalled-cycles-backend    #   18.34% backend cycles idle      ( +-  0.18% )  (62.50%)
   79,482,489,382      instructions              #    2.96  insn per cycle
                                                 #    0.06  stalled cycles per insn  ( +-  0.01% )  (62.52%)
   16,069,809,100      branches                  # 2440.337 M/sec                    ( +-  0.01% )  (62.54%)
       50,892,766      branch-misses             #    0.32% of all branches          ( +-  0.05% )  (62.53%)
   26,035,545,689      L1-dcache-loads           # 3953.718 M/sec                    ( +-  0.01% )  (62.49%)
      110,040,468      L1-dcache-load-misses     #    0.42% of all L1-dcache accesses  ( +-  3.55% )  (62.46%)
  <not supported>      LLC-loads
  <not supported>      LLC-load-misses

          6.59287 +- 0.00390 seconds time elapsed  ( +-  0.06% )

Overall, I think the absence of a performance regression and easier extensibility make the more invasive changes worth it. But I'm happy to roll back to a version that uses the nested filter functions instead, if you think it's not worth the churn.

I was thinking of something simpler/less invasive (no need to add a new data structure to collect all the DIEs, etc).

ie: What if "filterByName" functions were renamed to "filter" and each took an extra argument, Tags, & did all the same stufff but also checked Tags?

I can do this, but I want to offer a bit of justification for the new structure first :-)

Maintainability-wise, I think the new approach is slightly easier to extend with additional filters (e.g. a --type filter for matching DW_TAG_type): they can either be added to dieFilter or, with my previous changeset, composed inside a lambda. The control flow is also a little easier to follow and makes use of make_filter_range, vs. calling through an overloaded filterByName with duplicate callsites for Die.dump.

Performance-wise, pre-collecting all of the DIEs is no slower than the previous approach. I've pasted some local benchmarks below; the test file is a debug build of llvm-dwarfdump, containing approximately 150MB of debug info. My changes are *slightly* faster, but that's probably noise/because the pre-collection simplifies the instructions needed to fetch each DIE.

Before this changeset:

Performance counter stats for './bin/llvm-dwarfdump --name Vector --regex /home/william/tmp/llvm-dwarfdump' (100 runs):

         6,587.02 msec task-clock                #    0.999 CPUs utilized            ( +-  0.06% )
              936      context-switches          #    0.142 K/sec                    ( +-  3.60% )
               61      cpu-migrations            #    0.009 K/sec                    ( +-  2.65% )
           89,214      page-faults               #    0.014 M/sec                    ( +-  0.00% )
   26,887,760,726      cycles                    #    4.082 GHz                      ( +-  0.04% )  (62.46%)
      475,358,555      stalled-cycles-frontend   #    1.77% frontend cycles idle     ( +-  0.59% )  (62.48%)
    6,463,934,156      stalled-cycles-backend    #   24.04% backend cycles idle      ( +-  0.16% )  (62.50%)
   79,870,809,617      instructions              #    2.97  insn per cycle
                                                 #    0.08  stalled cycles per insn  ( +-  0.01% )  (62.52%)
   16,098,031,769      branches                  # 2443.901 M/sec                    ( +-  0.01% )  (62.54%)
       53,414,087      branch-misses             #    0.33% of all branches          ( +-  0.24% )  (62.54%)
   26,475,961,150      L1-dcache-loads           # 4019.413 M/sec                    ( +-  0.01% )  (62.50%)
       99,919,546      L1-dcache-load-misses     #    0.38% of all L1-dcache accesses  ( +-  3.81% )  (62.47%)
  <not supported>      LLC-loads
  <not supported>      LLC-load-misses

          6.59507 +- 0.00363 seconds time elapsed  ( +-  0.05% )

and with DIE pre-collection:

Performance counter stats for './bin/llvm-dwarfdump --name Vector --regex /home/william/tmp/llvm-dwarfdump' (100 runs):

         6,585.08 msec task-clock                #    0.999 CPUs utilized            ( +-  0.06% )
              842      context-switches          #    0.128 K/sec                    ( +-  2.58% )
               69      cpu-migrations            #    0.010 K/sec                    ( +-  2.59% )
          128,934      page-faults               #    0.020 M/sec                    ( +-  0.00% )
   26,807,844,451      cycles                    #    4.071 GHz                      ( +-  0.05% )  (62.46%)
      472,800,149      stalled-cycles-frontend   #    1.76% frontend cycles idle     ( +-  0.70% )  (62.49%)
    4,915,697,574      stalled-cycles-backend    #   18.34% backend cycles idle      ( +-  0.18% )  (62.50%)
   79,482,489,382      instructions              #    2.96  insn per cycle
                                                 #    0.06  stalled cycles per insn  ( +-  0.01% )  (62.52%)
   16,069,809,100      branches                  # 2440.337 M/sec                    ( +-  0.01% )  (62.54%)
       50,892,766      branch-misses             #    0.32% of all branches          ( +-  0.05% )  (62.53%)
   26,035,545,689      L1-dcache-loads           # 3953.718 M/sec                    ( +-  0.01% )  (62.49%)
      110,040,468      L1-dcache-load-misses     #    0.42% of all L1-dcache accesses  ( +-  3.55% )  (62.46%)
  <not supported>      LLC-loads
  <not supported>      LLC-load-misses

          6.59287 +- 0.00390 seconds time elapsed  ( +-  0.06% )

Overall, I think the absence of a performance regression and easier extensibility make the more invasive changes worth it. But I'm happy to roll back to a version that uses the nested filter functions instead, if you think it's not worth the churn.

Generally it's best to separate refactoring from semantic changes - either before or after the semantic change. I'm not sure this code is worth a separate change to improve its extensibility.

But if I were going to go there, I think I'd end up with something more like this:

dumpFiltered(dieFilter, DICtx.normal_units(), OS);
dumpFiltered(dieFilter, DICtx.dwo_units(), OS);
llvm/tools/llvm-dwarfdump/llvm-dwarfdump.cpp
459–465

Looks like this has different behavior from the original - this'll print if the name or the tag is satisfied, the original printed only if the name and the tag were satisfied, I think? From the patch description, it sounds like the original was the desired behavior, and this new behavior is unintended.

(speaking of: this needs test coverage)

487

Guessing this should be const ref, rather than value - I think the value is std::string, so by value will be copying the strings unnecessarily? (might be worth actually naming std::string here to avoid confusion, even)

495

Either const auto& or non-const auto here? (or explicitly DWARFDie) - we don't usually use top-level const on local values in the LLVM codebase.