This is an archive of the discontinued LLVM Phabricator instance.

Add a llvm-gsymutil tool that can convert object files to GSYM and perform lookups.
ClosedPublic

Authored by clayborg on Feb 19 2020, 5:31 PM.

Details

Summary

This patch creates the llvm-gsymutil binary that can convert object files to GSYM using the --convert <path> option. It can also dump and lookup addresses within GSYM files that have been saved to disk.

To dump a file:

llvm-gsymutil /path/to/a.gsym

To perform address lookups, like with atos, on GSYM files:

llvm-gsymutil --address 0x1000 --address 0x1100 /path/to/a.gsym

To convert a mach-o or ELF file, including any DWARF debug info contained within the object files:

llvm-gsymutil --convert /path/to/a.out --out-file /path/to/a.out.gsym

Conversion highlights:

  • convert DWARF debug info in mach-o or ELF files to GSYM
  • convert symbols in symbol table to GSYM and don't convert symbols that overlap with DWARF debug info
  • extract UUID from object files
  • extract .text (read + execute) section address ranges and filter out any DWARF or symbols that don't fall in those ranges.
  • if .text sections are extracted, and if the last gsym::FunctionInfo object has no size, cap the size to the end of the section the function was contained in

Dumping GSYM files will dump all sections of the GSYM file in textual format.

Event Timeline

clayborg created this revision.Feb 19 2020, 5:31 PM
Herald added a project: Restricted Project. · View Herald Transcript

I do not feel qualified to review gsym stuff, but I do think that the DataExtractor changes should go in a separate patch.

I do not feel qualified to review gsym stuff, but I do think that the DataExtractor changes should go in a separate patch.

I thought I remembered someone saying they didn't want dead code being checked in on its own. But I can split this out if needed.

I do not feel qualified to review gsym stuff, but I do think that the DataExtractor changes should go in a separate patch.

I thought I remembered someone saying they didn't want dead code being checked in on its own. But I can split this out if needed.

I'm not sure what was the context of that discussion, but that seem odd. Just this week, I was asked to do a similar split too (and my patch was much smaller than this one). Either way, I think that in this particular case, any concerns like that can be resolved by just linking DataExtractor patch to the patch which adds code for using it.

I do not feel qualified to review gsym stuff, but I do think that the DataExtractor changes should go in a separate patch.

I thought I remembered someone saying they didn't want dead code being checked in on its own. But I can split this out if needed.

I'm not sure what was the context of that discussion, but that seem odd. Just this week, I was asked to do a similar split too (and my patch was much smaller than this one). Either way, I think that in this particular case, any concerns like that can be resolved by just linking DataExtractor patch to the patch which adds code for using it.

That was a long time ago when I tried to check something into DataExtractor on its own. I am fine to split this up. It has a unit test, so it isn't actually dead code.

So the data extractor stuff has been pulled out in: https://reviews.llvm.org/D74991

I will update the diffs for this patch once that is checked in, so please ignore any DataExtractor stuff in this patch for now.

clayborg updated this revision to Diff 246338.Feb 24 2020, 4:24 PM

Update after D74991 to use DataExtractor changes from that patch.

wallace added inline comments.Feb 25 2020, 8:22 AM
llvm/tools/llvm-gsym/llvm-gsymutil.cpp
167

probably you don't want to append InputPath if it's an empty dsym directory

wallace accepted this revision.Feb 25 2020, 8:22 AM
This revision is now accepted and ready to land.Feb 25 2020, 8:22 AM

Adrian, any comments or objections?

aprantl accepted this revision.Feb 25 2020, 1:52 PM

My only concern is that it gets confusing to have both a gsymutil and a dsymutil but I think that ship has sailed when the format was called gsym.

Couple of coding style issues inline.

llvm/lib/DebugInfo/GSYM/ObjectFileTransformer.cpp
37

Generally, for readability it might be better to write these checks as

Expected<StringRef> SectNameOrErr = Sect.getName();
if (! SectNameOrErr) {
  consumeError(SectNameOrErr.takeError());
  continue;
}
SectName = *SectNameOrErr;
96
if (!Name) {
      logAllUnhandledError
      break;
}
llvm/tools/llvm-gsym/llvm-gsymutil.cpp
146
if (!...)
  return {};
364

if (!...) break

498

These deeply nested loops are very hard to follow.

if (LookupAddresses.empty())) {
   Gsym->dump(outs());
   continue;
}

// Lookup an address in a GSYM file and print any matches.

My only concern is that it gets confusing to have both a gsymutil and a dsymutil but I think that ship has sailed when the format was called gsym.

Yeah, easy to rename if needed. Don't care if we change the name, I am open to suggestions.

This revision was automatically updated to reflect the committed changes.
thakis added a subscriber: thakis.Feb 26 2020, 5:25 AM

Could we either rename the binary to llvm-gsym or the directory housing it to llvm-gsymutil, so that they match?

thakis added inline comments.Feb 26 2020, 5:29 AM
llvm/test/tools/llvm-gsymutil/cmdline.test
2

Also, since there are tests running this binary, llvm/test/CMakeLists.txt LLVM_TEST_DEPENDS should include llvm-gsymutil.

Sorry for all the comments! This is the last one I think.

llvm/tools/llvm-gsym/CMakeLists.txt
8

Finally, since you have ${LLVM_TARGETS_TO_BUILD} already, you don't need to list AllTargetsDescs / AllTargetsInfos – LLVM_TARGETS_TO_BUILD includes those already. (But if you can change your code to not need LLVM_TARGETS_TO_BUILD then it's better to depend on only the Descs and Infos instead, since then less code needs to be linked into llvm-gsymutil.)

Hi, this seems to have led to a test failure on our clang bots:

FAIL: LLVM :: tools/llvm-gsymutil/cmdline.test (34358 of 36213)
******************** TEST 'LLVM :: tools/llvm-gsymutil/cmdline.test' FAILED ********************
Script:
--
: 'RUN: at line 1';   llvm-gsymutil -h 2>&1 | /b/s/w/ir/k/recipe_cleanup/clangSeHACg/llvm_build_dir/bin/FileCheck --check-prefix=HELP /b/s/w/ir/k/llvm-project/llvm/test/tools/llvm-gsymutil/cmdline.test
: 'RUN: at line 2';   llvm-gsymutil --help 2>&1 | /b/s/w/ir/k/recipe_cleanup/clangSeHACg/llvm_build_dir/bin/FileCheck --check-prefix=HELP /b/s/w/ir/k/llvm-project/llvm/test/tools/llvm-gsymutil/cmdline.test
: 'RUN: at line 20';   llvm-gsymutil --version 2>&1 | /b/s/w/ir/k/recipe_cleanup/clangSeHACg/llvm_build_dir/bin/FileCheck --check-prefix=VERSION /b/s/w/ir/k/llvm-project/llvm/test/tools/llvm-gsymutil/cmdline.test
--
Exit Code: 1

Command Output (stderr):
--
/b/s/w/ir/k/llvm-project/llvm/test/tools/llvm-gsymutil/cmdline.test:3:7: error: HELP: expected string not found in input
HELP: OVERVIEW: A tool for dumping, searching and creating GSYM files.
      ^
<stdin>:1:1: note: scanning from here
/b/s/w/ir/k/recipe_cleanup/clangSeHACg/llvm_build_dir/test/tools/llvm-gsymutil/Output/cmdline.test.script: line 1: llvm-gsymutil: command not found
^

--

********************
Testing:  0.. 10.. 20.. 30.. 40.. 50.. 60.. 70.. 80.. 90.. 

Testing Time: 133.64s
********************
Failing Tests (1):
    LLVM :: tools/llvm-gsymutil/cmdline.test

Could you take a look? Thanks.

Builder: https://ci.chromium.org/p/fuchsia/builders/ci/clang-linux-x64/b8887418276173121952?

Fix for buildbots:

commit e4af56db27e5007ae6f6095a0ba0421211de9ba3 (HEAD -> master, origin/master)
Author: Greg Clayton <gclayton@fb.com>
Date: Wed Feb 26 10:30:04 2020 -0800

Fix buildbots after recent GSYM commit.

Added llvm-gsymutil to LLVM_TEST_DEPENDS.
clayborg marked an inline comment as done.Feb 26 2020, 10:40 AM

Could we either rename the binary to llvm-gsym or the directory housing it to llvm-gsymutil, so that they match?

I will do a follow up patch once builtbots are happy. Thanks for the comments.

llvm/tools/llvm-gsym/CMakeLists.txt
8

Ok, will do follow up patch for this.

Another thing that I realized too late: Is there a good reason to all it llvm-gsymutil instead of gsymutil? We only call llvm-dwarfdump llvm-dwarfdump to disambiguate it from other dwarfdump tools.

clayborg added a comment.EditedFeb 26 2020, 2:55 PM

Another thing that I realized too late: Is there a good reason to all it llvm-gsymutil instead of gsymutil? We only call llvm-dwarfdump llvm-dwarfdump to disambiguate it from other dwarfdump tools.

I am fine with "gsymutil" or "llvm-gsym". I will post a patch to rename things and clean up the deps of the tool as mentioned and we can discuss in there. One main question though: if we rename the tool do I need to rename the directory to match?

dblaikie added inline comments.
llvm/lib/DebugInfo/GSYM/FunctionInfo.cpp
173

Use !empty() rather than size() > 0

llvm/lib/DebugInfo/GSYM/GsymCreator.cpp
247

Use empty() rather than size() == 0

llvm/lib/DebugInfo/GSYM/LookupResult.cpp
41

Use early return to reduce indentation here, probably?

llvm/lib/DebugInfo/GSYM/ObjectFileTransformer.cpp
29–33

Might be worth reducing indentation here:

if (MachO)
  ...
  return UUID;
}
if (!ELF)
  return {};

/* all the ELF handling here, not indented */
31

This test doesn't seem to change the behavior of the program? (ie: copying an empty ArrayRef is a no-op here, so remove the conditional to simplify the code)

41

Use = rather than () to ensure only implicit conversions are in use:

StringRef SectName = *SectNameOrErr;

(applies in otehr places in this patch too)

46–51

Invert & unindent this:

if (!E) {
  consumeError(E.takeError());
  continue;
}

StringRef BuildIDData = *E;
...
76

Please remove const from local variables - this isn't common in LLVM & isn't common/consistent in this patch/code.

llvm/tools/llvm-gsym/llvm-gsymutil.cpp
172–175
260–271

(more else-after-return)

297

LLVM doesn't tend to use "const" on locals (& it isn't applied consistently even in this code, for instance - "ThreadCount", DICtx, Endian are not const). Please remove it. (it can be misleading/confusing - might make a reader think it was meant to be const reference)

503

This is implicit in C++ - do many other LLVM utilities have this line? I'd expect the dominant paradigm is to not have an explicit return 0/EXIT_SUCCESS & rely on C++'s default here -and if that is the dominant paradigm in LLVM, please follow it here & remove this line.

MaskRay added inline comments.
llvm/lib/DebugInfo/GSYM/GsymCreator.cpp
249

LLVM code tends to delete excess parentheses.

llvm/lib/DebugInfo/GSYM/ObjectFileTransformer.cpp
91

NoCopy is only used once.

Deleting it and using /*NoCopy=*/false at the call site. will be clearer/simpler.

llvm/test/tools/llvm-gsymutil/cmdline.test
3

See test/tools/llvm-strings/help.test for an example

llvm/tools/llvm-gsym/llvm-gsymutil.cpp
48

using namespace llvm::gsym; is preferred.

49

using namespace llvm::object; is preferred.

222

Missing a

232

Write auto type explicitly, unless obvious.

329

auto -> Error

428

InitLLVM X(argc, argv);.

You can find some modern cases in tools/llvm-objcopy and tools/llvm-readobj.

435

If AsmPrinter needed?

450

clang-format prefers /*Hidden=*/false (no space).

454

There are other outs() references below. The OS alias is not useful.

dblaikie and MaskRay: thanks for the code guideline and inline comments. I will fix this in an upcoming patch.