This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/DebugInfo/
-
llvm/
-
DebugInfo/
-
DIContext.h
-
GSYM/
-
GsymDIContext.h
-
Symbolize/
-
Symbolize.h
-
lib/DebugInfo/
-
DebugInfo/
-
GSYM/
-
CMakeLists.txt
20/22
GsymDIContext.cpp
-
Symbolize/
-
CMakeLists.txt
-
Symbolize.cpp
-
test/tools/llvm-symbolizer/
-
tools/
-
llvm-symbolizer/
-
Inputs/
-
addr-gsymonly.exe
-
addr-gsymonly.exe.gsym
3/6
sym-gsymonly.test
-
utils/gn/secondary/llvm/lib/DebugInfo/GSYM/
-
gn/
-
secondary/
-
llvm/
-
lib/
-
DebugInfo/
-
GSYM/
-
BUILD.gn

Differential D105985

Support GSYM in llvm-symbolizer.
Needs ReviewPublic

Authored by simon.giesecke on Jul 14 2021, 7:39 AM.

Download Raw Diff

Details

Reviewers

clayborg
jhenderson

Summary

Added GSYM-only tests.

Diff Detail

Repository: rG LLVM Github Monorepo

Unit TestsFailed

	Time	Test
	2,470 ms	x64 debian > libarcher.barrier::barrier.c
	2,690 ms	x64 debian > libarcher.critical::critical.c
	2,440 ms	x64 debian > libarcher.critical::lock-nested.c
	2,650 ms	x64 debian > libarcher.races::critical-unrelated.c
	2,700 ms	x64 debian > libarcher.races::lock-nested-unrelated.c
		View Full Test Results (18 Failed)

Event Timeline

simon.giesecke created this revision.Jul 14 2021, 7:39 AM

Herald added a reviewer: jhenderson. · View Herald TranscriptJul 14 2021, 7:39 AM

Herald added subscribers: rupprecht, hiraditya, mgorny. · View Herald Transcript

simon.giesecke requested review of this revision.Jul 14 2021, 7:39 AM

Herald added a project: Restricted Project. · View Herald TranscriptJul 14 2021, 7:39 AM

Herald added subscribers: llvm-commits, MaskRay. · View Herald Transcript

simon.giesecke added a parent revision: D105982: Reformat files..Jul 14 2021, 7:39 AM

Harbormaster completed remote builds in B113978: Diff 358602.Jul 14 2021, 7:40 AM

There are several TODO markers in the code. Please provide some feedback on those.

Resolving duplication between getInliningInfoForAddress and getLineInfoForAddress is something I am currently taking care of.

Resolved duplication between getLineInfoForAddress and getInliningInfoForAddress.

Note that sym-gsymonly.test is mostly a copy of sym.test & using the different input file. The only other difference is that I removed the column number for main, which we don't have available from the GSYM file, IIUC.

Harbormaster completed remote builds in B113984: Diff 358608.Jul 14 2021, 8:41 AM

clayborg added inline comments.Jul 14 2021, 12:56 PM

llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp
26	If Specifier.FNKind is set to DINameKind::ShortName, we want to demangle the name into LineInfo.FunctionName. If it is set to DINameKind::LinkageName or DINameKind::None, set to to the current name. Demangling can be tricky, as there are many name manglings. Not sure if there is a one stop shop to demangle all sorts of names in LLVM.
30–32	We should watch out for an empty Location.Dir and also for an empty Location.Base.
37	Move DILineInfoSpecifier::FileLineInfoKind::RawValue up to the DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath case. RawValue is described in comments as: // RawValue is whatever the compiler stored in the filename table. Could be // a full path, could be something else. So we should emit this as a full path of what was found in the GSYM
43–44	The header file describes DILineInfoSpecifier::FileLineInfoKind::RelativeFilePath using: // Relative to the compilation directory. We don't have the compilation directory here, so I would just emit a full path, so move "case DILineInfoSpecifier::FileLineInfoKind::RelativeFilePath:" up to the case for "DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath".
50	We don't have the source. I would avoid attempting to read it from disk as well because you might be symbolicating something from another machine or architecture.
53	Yes, set to zero
55–57	yeah, we don't have this info
59	We can. If it is set to DINameKind::ShortName, we want to demangle the name into LineInfo.FunctionName. If it is set to DINameKind::LinkageName or DINameKind::None, set to to the current name.
67	Check for a Address for a valid section index: if (Address.SectionIndex != llvm::object::SectionedAddress::UndefSection) return {};
69–70	You need to consume the error, or this will crash.
76–77	So the "Result.FuncName" is the name of the concrete function or symbol if there was no debug info for a symbol in the symbol table. So you only want to use this if "Result.Locations" is empty. If "Result.locations" is not empty, you don't want to use that because the "Result.Locations.front()" might point to a inlined function. The name in the SourceLocation will be the inlined function name. The main question is what other symbolizers do in this case. Lets say we are in the concrete function "main" and the "Result.Locations.front()" points to something line "std::vector<int>::empty()" at "/.../vector:123", do other symbolizers create a name the includes the inlined function + the concrete function? I would find an address in a DWARF file that points to an inlined function and symbolize it using the DWARF and see what gets output for the DWARF. The other issue here is that GSYM can return multiple locations for a single address since GSYM will unwind the inline call stack. The LookupResult contains an array of locations. Do we not want to convey this back? Or does the LLVM symbolizer always want the deepest inline function for a given address? But seeing as below we get the inlining information in GsymDIContext::getInliningInfoForAddress(...), maybe we should always be returning the Result.FuncName? It really depends on what the other symbolizers do. I would make a small DWARF file, convert it to GSYM, then do lookups using the GSYM and the DWARF and making sure the DI classes match.
78–82
98	If this isn't used, is there a reason we are converting it?
100	If you wanted to do this you would get the full gsym::FunctionInfo from the GsymReader and convert any addresses that fall into this range. if (Address.SectionIndex != llvm::object::SectionedAddress::UndefSection) return DILineInfoTable(); if (auto LineTableOrErr = Reader->getFunctionInfo(Address.Address)) { // Iterate over line table entries and take the ones that fall in the range. } else { consumeError(LineTableOrErr.takeError()); return DILineInfoTable(); }

I have zero experience with or knowledge about gsym, so I can't comment on the implementation details, but some higher-level comments:

Make sure to review the llvm-symbolizer documentation to see if there's anything there that needs adding/updating given this change (there may not be).
Can you avoid using pre-canned binaries for the test input, by somehow generating the inputs on the fly? The existing pre-canned binaries in the llvm-symbolizer tests are already sub-optimal, in my opinion, as they are opaque and make testing less flexible.

llvm/test/tools/llvm-symbolizer/sym-gsymonly.test
2	Probably worth a top-level comment explaining what this test is supposed to be testing. Also probably you can just rename this test "gsym.test". I'm not sure what the "only" bit is about, and the "sym-" prefix similarly doesn't look to add any meaning.
21	Nit: I'd normalise the comment markers throughout this test, using the following two rules: True comments start with `##` (followed by a space). Applies for the source above. This helps distinguish the test details from actual comments. Add `#` as a comment marker to all RUN lines, much as we do in the vast majority of newer tests. Follow it with a space again. CHECK and equivalent lines should be `# CHECK` (with space). Finally, as noted out-of-line, I've not looked at the implementation, but do you need all these individual test-cases for gsym testing? Most of them look more like they're testing generic symbolizer behaviour, which is therefore already covered elsewhere.

In D105985#2879253, @jhenderson wrote:

I have zero experience with or knowledge about gsym, so I can't comment on the implementation details, but some higher-level comments:

Make sure to review the llvm-symbolizer documentation to see if there's anything there that needs adding/updating given this change (there may not be).

Sure, I'll check that.

Can you avoid using pre-canned binaries for the test input, by somehow generating the inputs on the fly? The existing pre-canned binaries in the llvm-symbolizer tests are already sub-optimal, in my opinion, as they are opaque and make testing less flexible.

The sym-gsymonly test is mostly a copy of the sym test, and the binary is just the stripped binary. These additional files don't need to be part of the repository, we could run llvm-gsymutil --convert and strip DWARF from it as part of the test.

If this should be migrated to remove the use of the canned binary entirely, I fear I am not knowledgable enough on the test infrastructure to do that.

This should probably have been stated in the test file.

simon.giesecke added inline comments.Jul 15 2021, 1:38 AM

llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp
26	Hm, what do you suggest to do here then? I think I am not knowledgeable enough to implement the demangling without further guidance. Can we leave that to future work? In that case, should we fail here or fall back to not demangling, for now? FWIW, I found that `PDBContext` returns the empty string in case of `DINameKind::None`.
76–77	The other issue here is that GSYM can return multiple locations for a single address since GSYM will unwind the inline call stack. The LookupResult contains an array of locations. Do we not want to convey this back? Or does the LLVM symbolizer always want the deepest inline function for a given address? Well, there's `getInliningInfoForAddress` which returns all inline locations, which is used when the `-inlines` option is set. When it is not set, I implemented the same behaviour here as the DWARF implementation as checked by `sym.test`. I don't know what's the rationale for that. I would make a small DWARF file, convert it to GSYM, then do lookups using the GSYM and the DWARF and making sure the DI classes match. That's basically what I did. I took the existing `addr.exe` from the `sym.test` test case, converted that to GSYM and stripped DWARF from the object file, and ensured that the test cases create the same output, which they do, except for the missing column information. This covers the inline case. Not sure if there are any tests that check consistency of the DWARF vs. PDB behaviour here.
98	Well, this is an implementation of the `DIContext` interface, and ideally we would have a full implementation of the interface (or we should at least have a way to indicate that we only have a partial implementation?). The function is called in the context of JIT, by several event listeners and from `llvm-rtdlyd` right now. I wonder if GSYM support is required there, but if only DWARF is supported there anyway, why does that use the format-neutral DIContext interface? I didn't intend to implement this as part of this patch in any case.
llvm/test/tools/llvm-symbolizer/sym-gsymonly.test
2	Probably worth a top-level comment explaining what this test is supposed to be testing. Definitely, I'll add that. Also probably you can just rename this test "gsym.test". I'm not sure what the "only" bit is about, and the "sym-" prefix similarly doesn't look to add any meaning. "gsymonly" refers to the fact that the binary doesn't have DWARF debug info but only a corresponding GSYM file. Another case would be that we have both a GSYM file and DWARF debug info. This should probably be tested as well.
21	As I mentioned above, this is indeed a copy of sym.test, using a different input binary. The results should be the same, except for the fact that we don't get any column information from GSYM. I checked the list of test cases again, and arguably some tests seem to only test the formatting of the output, and this is somehow orthogonal to the data source used for symbolication. But at least some are passed into the `DIContext` implementation (inlining, basename, ...), and I am not sure if we should make assumptions on the implementation detail of which command line options interact with the `DIContext` implementation at this level. If you think specific test cases should be removed here, please suggest them.

Addressed some comments by clayborg.

simon.giesecke added inline comments.Jul 15 2021, 2:31 AM

llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp
26	DWARFDie also returns an empty string in case of `DINameKind::None`, so I guess we should do the same here.
59	Marking this as done, as this is the same discussion as above, I also moved the code comment up.

Harbormaster completed remote builds in B114180: Diff 358875.Jul 15 2021, 3:19 AM

clayborg requested changes to this revision.Jul 15 2021, 3:55 PM

clayborg added inline comments.

llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp

99–100

as long as we are doing this layer, we might as well fill this in. The code above is a good start. Something like:

if (Address.SectionIndex != llvm::object::SectionedAddress::UndefSection)
  return DILineInfoTable();

if (auto FuncInfoOrErr = Reader->getFunctionInfo(Address.Address)) {
  if (FuncInfoOrErr->OptLineTable) {
    const gsym::LineTable &LT = *FuncInfoOrErr->OptLineTable;
    const uint64_t StartAddr = Address.Address;
    const uint64_t EndAddr = Address.Address + Size;
    for (const auto &gsym::LineEntry : LT) {
      if (StartAddr <= LineEntry.Addr && LineEntry.Addr < EndAddr) {
        // Use LineEntry.Addr, LineEntry.File (which is a file index into the 
        // files tables from the GsymReader), and LineEntry.Line (source line
        // number) to add stuff to the DILineInfoTable
      }
    }
  }
} else {
  consumeError(LineTableOrErr.takeError());
  return DILineInfoTable();
}

This revision now requires changes to proceed.Jul 15 2021, 3:55 PM

simon.giesecke marked 5 inline comments as done.Jul 16 2021, 12:27 AM

simon.giesecke added inline comments.

llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp
99–100	Ok.

Address clayborg's comments, implemented getLineInfoForAddressRange.

@clayborg Could you comment on the TODOs referring to llvm-symbolizer command line options:

// TODO We should provide an option to provide an alternative directory for

// GSYM files.

and

// TODO There should be an option that disables the preference for GSYM.

Does this make sense? Then I'll add these options.

Added top-level comment to sym-gsymonly.test

In D105985#2879253, @jhenderson wrote:

Make sure to review the llvm-symbolizer documentation to see if there's anything there that needs adding/updating given this change (there may not be).

I checked llvm/docs/CommandGuide/llvm-symbolizer.rst, and it doesn't say anything right now about how the data source is selected. Maybe it should, and I can add something, but then this should probably be more than just saying that GSYM is preferred when it's present. I think there's similar logic for .dwp files?

Harbormaster completed remote builds in B114433: Diff 359235.Jul 16 2021, 1:10 AM

In D105985#2882467, @simon.giesecke wrote:

In D105985#2879253, @jhenderson wrote:

Make sure to review the llvm-symbolizer documentation to see if there's anything there that needs adding/updating given this change (there may not be).

I checked llvm/docs/CommandGuide/llvm-symbolizer.rst, and it doesn't say anything right now about how the data source is selected. Maybe it should, and I can add something, but then this should probably be more than just saying that GSYM is preferred when it's present. I think there's similar logic for .dwp files?

I don't think we need to go into that in this change. I guess the point of the llvm-symbolizer documentation is more about how to use the tool, not what does the tool do under-the-hood, but I thought it was worth checking.

In D105985#2879285, @simon.giesecke wrote:

Can you avoid using pre-canned binaries for the test input, by somehow generating the inputs on the fly? The existing pre-canned binaries in the llvm-symbolizer tests are already sub-optimal, in my opinion, as they are opaque and make testing less flexible.

The sym-gsymonly test is mostly a copy of the sym test, and the binary is just the stripped binary. These additional files don't need to be part of the repository, we could run llvm-gsymutil --convert and strip DWARF from it as part of the test.

If this should be migrated to remove the use of the canned binary entirely, I fear I am not knowledgable enough on the test infrastructure to do that.

This should probably have been stated in the test file.

I don't know how gsym data is actually generated, so some of these suggestions may not make much sense.
Option 1) Add the test to the new cross-project-tests test directory. This was created for tests that rely on clang/lld or whatever to generate valid test inputs from source, without the need for a canned binary. You'd include the source and build it directly at run-time. This would only work though if your test addresses are unlikely to change as changes occur in the compiler/linker. llvm-symbolizer was one of the primary motivations for this actually.
Option 2) Check in the assembly required to build the input, and use llvm-mc and other tools (llvm-strip/llvm-gsymutil etc) to turn it into the desired output at runtime.
Option 3) Generate the gsym data directly at run-time somehow.

llvm/test/tools/llvm-symbolizer/sym-gsymonly.test
2–4	I'd avoid referencing other tests as part of this comment: sym.test has been on my wishlist as something I'd love to rewrite if I had the time, due to it's usage of precanned binaries, conflation of testing of multiple unrelated options and general vagueness of the test intent. Ideally, we'd avoid mirroring it entirely, in favour of testing the things that need to be tested specifically for this format.
21	I'm a bit constrained on time at the moment, so can't really dig into the source to look at what makes sense to test. Perhaps @clayborg has some suggestions there. I think it's reasonable to make assumptions based on the current implementation, about what is orthoganol and what isn't. If people refactor the area, to change where information is retrieved, it's probably reasonable to expect them to ensure test coverage is still sufficient, but I don't really know. You certainly should be able to drop the llvm-addr2line cases: llvm-addr2line is basically llvm-symbolizer with some different defaults on the options, and one or two minor formatting differences. The underlying source of information is irrelevant. Similarly, you can drop testing that different aliases mean the same thing, as these are tested elsewhere.

In D105985#2882537, @jhenderson wrote:

In D105985#2882467, @simon.giesecke wrote:

In D105985#2879253, @jhenderson wrote:

Make sure to review the llvm-symbolizer documentation to see if there's anything there that needs adding/updating given this change (there may not be).

I checked llvm/docs/CommandGuide/llvm-symbolizer.rst, and it doesn't say anything right now about how the data source is selected. Maybe it should, and I can add something, but then this should probably be more than just saying that GSYM is preferred when it's present. I think there's similar logic for .dwp files?

I don't think we need to go into that in this change. I guess the point of the llvm-symbolizer documentation is more about how to use the tool, not what does the tool do under-the-hood, but I thought it was worth checking.

Ok.

In D105985#2879285, @simon.giesecke wrote:

Can you avoid using pre-canned binaries for the test input, by somehow generating the inputs on the fly? The existing pre-canned binaries in the llvm-symbolizer tests are already sub-optimal, in my opinion, as they are opaque and make testing less flexible.

The sym-gsymonly test is mostly a copy of the sym test, and the binary is just the stripped binary. These additional files don't need to be part of the repository, we could run llvm-gsymutil --convert and strip DWARF from it as part of the test.

If this should be migrated to remove the use of the canned binary entirely, I fear I am not knowledgable enough on the test infrastructure to do that.

This should probably have been stated in the test file.

I don't know how gsym data is actually generated, so some of these suggestions may not make much sense.
Option 1) Add the test to the new cross-project-tests test directory. This was created for tests that rely on clang/lld or whatever to generate valid test inputs from source, without the need for a canned binary. You'd include the source and build it directly at run-time. This would only work though if your test addresses are unlikely to change as changes occur in the compiler/linker. llvm-symbolizer was one of the primary motivations for this actually.
Option 2) Check in the assembly required to build the input, and use llvm-mc and other tools (llvm-strip/llvm-gsymutil etc) to turn it into the desired output at runtime.
Option 3) Generate the gsym data directly at run-time somehow.

Hm, I think it makes sense to have a common ground for testing both the DWARF and the GSYM path, to ensure these are (mostly) consistent. If I rewrite this to a different approach, then sym.test should be similarly changed. OTOH, this might mean that rewriting both tests might be done at a later point?

I can remove the second canned binary (actually, it might not be required at all) and the canned GSYM file, and create both at runtime from addr.exe. However, the point is that this uses the same underlying binary as sym.test.

Generally, I think option 1 would be too fragile, and I guess that's the reason why sym.test uses the canned binary?

Option 2... might be feasible, but I don't readily know how to do that.

Option 3... hm I am not sure what you imagine there? Generate a GSYM file without a corresponding binary? That might make sense for additional tests, but again, the test I added was purposefully using the same binary as sym.test.

In D105985#2882575, @simon.giesecke wrote:

Hm, I think it makes sense to have a common ground for testing both the DWARF and the GSYM path, to ensure these are (mostly) consistent. If I rewrite this to a different approach, then sym.test should be similarly changed. OTOH, this might mean that rewriting both tests might be done at a later point?

I don't disagree that in principle any debug information implementation should be able to represent the same things, and therefore the testing here should be (in an ideal world) identical. However, there are a couple of points worth raising: the existing sym.test is testing features that are not specific to the debug format. Ideally, they should be pulled into their own test file (see above my comments about rewriting that test). Also, copying this test as-is would increase our maintenance burden over the longer term. A better approach might be to write the GSYM test from scratch, as if the DWARF test didn't exist, and then write a DWARF equivalent one (alternatively the other way around would also work). The important parts of sym.test could then be split off.

I can remove the second canned binary (actually, it might not be required at all) and the canned GSYM file, and create both at runtime from addr.exe. However, the point is that this uses the same underlying binary as sym.test.

As noted above, I don't think it's good for sym.test to be using a canned binary either. The reason it does is purely because at the time of writing it wasn't possible to avoid the canned binary. We have ways to do that now however.

Generally, I think option 1 would be too fragile, and I guess that's the reason why sym.test uses the canned binary?

cross-project-tests is new. sym.test is very old (see above). It's hard to judge whether 1 would be fragile, but the general opinion of the LLD developers is that the output is likely to be fairly stable for simple things going forward, so I don't think it would be too fragile (see the initial discussion on the mailing list about bringing up that test-suite).

Option 2... might be feasible, but I don't readily know how to do that.

I'm afraid I don't know anything about gsym and therefore how to do this either. It should be possible though somehow.

Option 3... hm I am not sure what you imagine there? Generate a GSYM file without a corresponding binary? That might make sense for additional tests, but again, the test I added was purposefully using the same binary as sym.test.

Some of this point may not make sense at all, given my lack of GSYM knowledge, but see also my above comments.

In D105985#2886506, @jhenderson wrote:

In D105985#2882575, @simon.giesecke wrote:

A better approach might be to write the GSYM test from scratch, as if the DWARF test didn't exist, and then write a DWARF equivalent one (alternatively the other way around would also work). The important parts of sym.test could then be split off.

Hm, that makes sense. I'll give the cross-project-tests a look.

However, one general question that comes to my mind here: The addresses will probably be platform-specific. How can I deal with that? I can determine the addresses on my local (Linux amd64) platform, but how would I do the same generally? I can use nm to determine the address of main (which for the canned binary is 0x400540), but I need some offset to identify an inlined location (which is 0x40054d) for the canned binary. I probably can't generally assume that 0xd is an appropriate offset for any platform.

In D105985#2886546, @simon.giesecke wrote:

In D105985#2886506, @jhenderson wrote:

In D105985#2882575, @simon.giesecke wrote:

A better approach might be to write the GSYM test from scratch, as if the DWARF test didn't exist, and then write a DWARF equivalent one (alternatively the other way around would also work). The important parts of sym.test could then be split off.

Hm, that makes sense. I'll give the cross-project-tests a look.

However, one general question that comes to my mind here: The addresses will probably be platform-specific. How can I deal with that? I can determine the addresses on my local (Linux amd64) platform, but how would I do the same generally? I can use nm to determine the address of main (which for the canned binary is 0x400540), but I need some offset to identify an inlined location (which is 0x40054d) for the canned binary. I probably can't generally assume that 0xd is an appropriate offset for any platform.

I'd pin to a specific triple, e.g. x86-64 (along with appropriate REQUIRES), so that the addresses aren't going to change due to moving to a different machine.

dblaikie added a subscriber: dblaikie.Jul 20 2021, 2:29 PM

@jhenderson I am struggling with even building cross-project-tests. How do I enable those? I tried setting calling cmake with either -DLLVM_TOOL_CROSS_PROJECT_TESTS_BUILD=TRUE or -DLLVM_ENABLE_PROJECTS="clang;cross-platform-tests", but those don't seem to do the job.

Add command line options to control GSYM behaviour.

Harbormaster completed remote builds in B115328: Diff 360473.Jul 21 2021, 10:17 AM

simon.giesecke added a project: debug-info.Jul 22 2021, 4:19 AM

In D105985#2892857, @simon.giesecke wrote:

@jhenderson I am struggling with even building cross-project-tests. How do I enable those? I tried setting calling cmake with either -DLLVM_TOOL_CROSS_PROJECT_TESTS_BUILD=TRUE or -DLLVM_ENABLE_PROJECTS="clang;cross-platform-tests", but those don't seem to do the job.

It's "cross-project-tests" not "cross-platform-tests" in the LLVM_ENABLE_PROJECTS listing.

If you're adding command-line options a) they might better belong in separate later patches, and b) the documentation should be updated to reference them.

In D105985#2899453, @jhenderson wrote:

In D105985#2892857, @simon.giesecke wrote:

@jhenderson I am struggling with even building cross-project-tests. How do I enable those? I tried setting calling cmake with either -DLLVM_TOOL_CROSS_PROJECT_TESTS_BUILD=TRUE or -DLLVM_ENABLE_PROJECTS="clang;cross-platform-tests", but those don't seem to do the job.

It's "cross-project-tests" not "cross-platform-tests" in the LLVM_ENABLE_PROJECTS listing.

Oh no, what a stupid mistake :( Thanks, that did the trick.

If you're adding command-line options a) they might better belong in separate later patches, and b) the documentation should be updated to reference them.

a) Hm, ok, this will just leave us without without the option to disable GSYM usage after the first patch.

b) Sure, I'll add something to llvm/docs/CommandGuide/llvm-symbolizer.rst

In D105985#2899504, @simon.giesecke wrote

If you're adding command-line options a) they might better belong in separate later patches, and b) the documentation should be updated to reference them.

a) Hm, ok, this will just leave us without without the option to disable GSYM usage after the first patch.

Yes, this is true. I don't know if that's an issue or not though? What's the impact on a typical end user in that interim state? In other words, is it likely that something bad will happen?

In D105985#2899535, @jhenderson wrote:

In D105985#2899504, @simon.giesecke wrote

If you're adding command-line options a) they might better belong in separate later patches, and b) the documentation should be updated to reference them.

a) Hm, ok, this will just leave us without without the option to disable GSYM usage after the first patch.

Yes, this is true. I don't know if that's an issue or not though? What's the impact on a typical end user in that interim state? In other words, is it likely that something bad will happen?

Not really. A typical end user that didn't use llvm-gsymutil before won't be affected at all.

Revision Contents

Path

Size

llvm/

include/

llvm/

DebugInfo/

DIContext.h

6 lines

GSYM/

GsymDIContext.h

64 lines

Symbolize/

Symbolize.h

2 lines

lib/

DebugInfo/

GSYM/

CMakeLists.txt

1 line

GsymDIContext.cpp

155 lines

Symbolize/

CMakeLists.txt

1 line

Symbolize.cpp

74 lines

test/

tools/

llvm-symbolizer/

Inputs/

addr-gsymonly.exe

addr-gsymonly.exe.gsym

sym-gsymonly.test

87 lines

utils/

gn/

secondary/

llvm/

lib/

DebugInfo/

GSYM/

BUILD.gn

1 line

Diff 358875

llvm/include/llvm/DebugInfo/DIContext.h

Show First 20 Lines • Show All 210 Lines • ▼ Show 20 Lines	struct DIDumpOptions {

std::function<void(Error)> RecoverableErrorHandler =		std::function<void(Error)> RecoverableErrorHandler =
WithColor::defaultErrorHandler;		WithColor::defaultErrorHandler;
std::function<void(Error)> WarningHandler = WithColor::defaultWarningHandler;		std::function<void(Error)> WarningHandler = WithColor::defaultWarningHandler;
};		};

class DIContext {		class DIContext {
public:		public:
enum DIContextKind { CK_DWARF, CK_PDB };		enum DIContextKind {
		CK_DWARF,
		CK_PDB,
		CK_GSYM,
		};

DIContext(DIContextKind K) : Kind(K) {}		DIContext(DIContextKind K) : Kind(K) {}
virtual ~DIContext() = default;		virtual ~DIContext() = default;

DIContextKind getKind() const { return Kind; }		DIContextKind getKind() const { return Kind; }

virtual void dump(raw_ostream &OS, DIDumpOptions DumpOpts) = 0;		virtual void dump(raw_ostream &OS, DIDumpOptions DumpOpts) = 0;

▲ Show 20 Lines • Show All 85 Lines • Show Last 20 Lines

llvm/include/llvm/DebugInfo/GSYM/GsymDIContext.h

This file was added.

				//===-- GsymDIContext.h --------------------------------------------------===//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===/

				#ifndef LLVM_DEBUGINFO_GSYM_GSYMDICONTEXT_H
				#define LLVM_DEBUGINFO_GSYM_GSYMDICONTEXT_H

				#include "llvm/DebugInfo/DIContext.h"
				#include <cstdint>
				#include <memory>
				#include <string>

				namespace llvm {

				namespace gsym {

				class GsymReader;

				/// GSYM DI Context
				/// This data structure is the top level entity that deals with GSYM
				/// symbolication.
				/// This data structure exists only when there is a need for a transparent
				/// interface to different symbolication formats (e.g. GSYM, PDB and DWARF).
				/// More control and power over the debug information access can be had by using
				/// the GSYM interfaces directly.
				class GsymDIContext : public DIContext {
				public:
				GsymDIContext(std::unique_ptr<GsymReader> Reader);

				GsymDIContext(GsymDIContext &) = delete;
				GsymDIContext &operator=(GsymDIContext &) = delete;

				static bool classof(const DIContext *DICtx) {
				return DICtx->getKind() == CK_GSYM;
				}

				void dump(raw_ostream &OS, DIDumpOptions DIDumpOpts) override;

				DILineInfo getLineInfoForAddress(
				object::SectionedAddress Address,
				DILineInfoSpecifier Specifier = DILineInfoSpecifier()) override;
				DILineInfoTable getLineInfoForAddressRange(
				object::SectionedAddress Address, uint64_t Size,
				DILineInfoSpecifier Specifier = DILineInfoSpecifier()) override;
				DIInliningInfo getInliningInfoForAddress(
				object::SectionedAddress Address,
				DILineInfoSpecifier Specifier = DILineInfoSpecifier()) override;

				std::vector<DILocal>
				getLocalsForAddress(object::SectionedAddress Address) override;

				private:
				const std::unique_ptr<GsymReader> Reader;
				};

				} // end namespace gsym

				} // end namespace llvm

				#endif // LLVM_DEBUGINFO_PDB_PDBCONTEXT_H
				Lint: Pre-merge checks Inline Actions clang-tidy: warning: #endif for a header guard should reference the guard macro in a comment [llvm-header-guard] not useful Lint: Pre-merge checks: clang-tidy: warning: #endif for a header guard should reference the guard macro in a comment…

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

Show First 20 Lines • Show All 120 Lines • ▼ Show 20 Lines	ObjectFile *lookUpDsymFile(const std::string &Path,
const std::string &ArchName);		const std::string &ArchName);
ObjectFile *lookUpDebuglinkObject(const std::string &Path,		ObjectFile *lookUpDebuglinkObject(const std::string &Path,
const ObjectFile *Obj,		const ObjectFile *Obj,
const std::string &ArchName);		const std::string &ArchName);
ObjectFile *lookUpBuildIDObject(const std::string &Path,		ObjectFile *lookUpBuildIDObject(const std::string &Path,
const ELFObjectFileBase *Obj,		const ELFObjectFileBase *Obj,
const std::string &ArchName);		const std::string &ArchName);

		std::string lookUpGsymFile(const std::string &Path);

/// Returns pair of pointers to object and debug object.		/// Returns pair of pointers to object and debug object.
Expected<ObjectPair> getOrCreateObjectPair(const std::string &Path,		Expected<ObjectPair> getOrCreateObjectPair(const std::string &Path,
const std::string &ArchName);		const std::string &ArchName);

/// Return a pointer to object file at specified path, for a specified		/// Return a pointer to object file at specified path, for a specified
/// architecture (e.g. if path refers to a Mach-O universal binary, only one		/// architecture (e.g. if path refers to a Mach-O universal binary, only one
/// object file from it will be returned).		/// object file from it will be returned).
Expected<ObjectFile *> getOrCreateObject(const std::string &Path,		Expected<ObjectFile *> getOrCreateObject(const std::string &Path,
Show All 24 Lines

llvm/lib/DebugInfo/GSYM/CMakeLists.txt

	add_llvm_component_library(LLVMDebugInfoGSYM			add_llvm_component_library(LLVMDebugInfoGSYM
	DwarfTransformer.cpp			DwarfTransformer.cpp
	Header.cpp			Header.cpp
	FileWriter.cpp			FileWriter.cpp
	FunctionInfo.cpp			FunctionInfo.cpp
	GsymCreator.cpp			GsymCreator.cpp
				GsymDIContext.cpp
	GsymReader.cpp			GsymReader.cpp
	InlineInfo.cpp			InlineInfo.cpp
	LineTable.cpp			LineTable.cpp
	LookupResult.cpp			LookupResult.cpp
	ObjectFileTransformer.cpp			ObjectFileTransformer.cpp
	Range.cpp			Range.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	Show All 12 Lines

llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp

This file was added.

//===-- GsymDIContext.cpp ------------------------------------------------===//

// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.

// See https://llvm.org/LICENSE.txt for license information.

// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception

//===----------------------------------------------------------------------===/

#include "llvm/DebugInfo/GSYM/GsymDIContext.h"

#include "llvm/DebugInfo/GSYM/GsymReader.h"

#include "llvm/Support/Path.h"

using namespace llvm;

using namespace llvm::gsym;

GsymDIContext::GsymDIContext(std::unique_ptr<GsymReader> Reader)

: DIContext(CK_GSYM), Reader(std::move(Reader)) {}

void GsymDIContext::dump(raw_ostream &OS, DIDumpOptions DumpOpts) {}

static bool fillLineInfoFromLocation(const SourceLocation &Location,

DILineInfoSpecifier Specifier,

DILineInfo &LineInfo) {

// FIXME Demangle in case of DINameKind::ShortName

if (Specifier.FNKind != DINameKind::None) {

clayborgUnsubmitted

Not Done

DILineInfo &LineInfo) {

- LineInfo.FunctionName = static_cast<std::string>(Location.Name);

+ LineInfo.FunctionName = Location.Name.str();

switch (Specifier.FLIKind) {

If Specifier.FNKind is set to DINameKind::ShortName, we want to demangle the name into LineInfo.FunctionName. If it is set to DINameKind::LinkageName or DINameKind::None, set to to the current name. Demangling can be tricky, as there are many name manglings. Not sure if there is a one stop shop to demangle all sorts of names in LLVM.

clayborg: If Specifier.FNKind is set to DINameKind::ShortName, we want to demangle the name into LineInfo.

simon.gieseckeAuthorUnsubmitted

Done

Hm, what do you suggest to do here then? I think I am not knowledgeable enough to implement the demangling without further guidance.

Can we leave that to future work? In that case, should we fail here or fall back to not demangling, for now?

FWIW, I found that PDBContext returns the empty string in case of DINameKind::None.

simon.giesecke: Hm, what do you suggest to do here then? I think I am not knowledgeable enough to implement the…

simon.gieseckeAuthorUnsubmitted

Done

DWARFDie also returns an empty string in case of DINameKind::None, so I guess we should do the same here.

simon.giesecke: DWARFDie also returns an empty string in case of `DINameKind::None`, so I guess we should do…

LineInfo.FunctionName = static_cast<std::string>(Location.Name);

clayborgUnsubmitted

Done

if (Specifier.FNKind != DINameKind::None) {

- LineInfo.FunctionName = static_cast<std::string>(Location.Name);

+ LineInfo.FunctionName = Location.Name.str();

}

switch (Specifier.FLIKind) {

clayborg:

}

switch (Specifier.FLIKind) {

case DILineInfoSpecifier::FileLineInfoKind::RelativeFilePath:

// We have no information to determine the relative path, so we fall back to

clayborgUnsubmitted

Done

case DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath: {

- SmallString<128> P(Location.Dir);

- sys::path::append(P, Location.Base);

- LineInfo.FileName = static_cast<std::string>(P);

+ if (Location.Dir.empty()) {

+ if (Location.Base.empty())

+ LineInfo.Filename = DILineInfo::BadString;

+ else

+ LineInfo.Filename = Location.Base.str();

+ } else {

+ SmallString<128> Path(Location.Dir);

+ sys::path::append(P, Location.Base);

+ LineInfo.FileName = static_cast<std::string>(P);

+ }

break;

We should watch out for an empty Location.Dir and also for an empty Location.Base.

clayborg: We should watch out for an empty Location.Dir and also for an empty Location.Base.

// returning the absolute path.

case DILineInfoSpecifier::FileLineInfoKind::RawValue:

case DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath:

if (Location.Dir.empty()) {

if (Location.Base.empty())

clayborgUnsubmitted

Done

Move DILineInfoSpecifier::FileLineInfoKind::RawValue up to the DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath case. RawValue is described in comments as:

// RawValue is whatever the compiler stored in the filename table.  Could be
// a full path, could be something else.

So we should emit this as a full path of what was found in the GSYM

clayborg: Move DILineInfoSpecifier::FileLineInfoKind::RawValue up to the DILineInfoSpecifier…

LineInfo.FileName = DILineInfo::BadString;

else

LineInfo.FileName = Location.Base.str();

} else {

SmallString<128> Path(Location.Dir);

sys::path::append(Path, Location.Base);

LineInfo.FileName = static_cast<std::string>(Path);

clayborgUnsubmitted

Done

The header file describes DILineInfoSpecifier::FileLineInfoKind::RelativeFilePath using:

// Relative to the compilation directory.

We don't have the compilation directory here, so I would just emit a full path, so move "case DILineInfoSpecifier::FileLineInfoKind::RelativeFilePath:" up to the case for "DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath".

clayborg: The header file describes DILineInfoSpecifier::FileLineInfoKind::RelativeFilePath using: ```…

}

break;

case DILineInfoSpecifier::FileLineInfoKind::BaseNameOnly:

LineInfo.FileName = static_cast<std::string>(Location.Base);

break;

clayborgUnsubmitted

Done

We don't have the source. I would avoid attempting to read it from disk as well because you might be symbolicating something from another machine or architecture.

clayborg: We don't have the source. I would avoid attempting to read it from disk as well because you…

default:

return false;

clayborgUnsubmitted

Done

Yes, set to zero

clayborg: Yes, set to zero

}

LineInfo.Line = Location.Line;

// We don't have information in GSYM to fill any of the Source, Column,

clayborgUnsubmitted

Done

yeah, we don't have this info

clayborg: yeah, we don't have this info

// StartFileName or StartLine attributes.

clayborgUnsubmitted

Done

We can. If it is set to DINameKind::ShortName, we want to demangle the name into LineInfo.FunctionName. If it is set to DINameKind::LinkageName or DINameKind::None, set to to the current name.

clayborg: We can. If it is set to DINameKind::ShortName, we want to demangle the name into LineInfo.

simon.gieseckeAuthorUnsubmitted

Done

Marking this as done, as this is the same discussion as above, I also moved the code comment up.

simon.giesecke: Marking this as done, as this is the same discussion as above, I also moved the code comment up.

return true;

}

DILineInfo

GsymDIContext::getLineInfoForAddress(object::SectionedAddress Address,

DILineInfoSpecifier Specifier) {

if (Address.SectionIndex != object::SectionedAddress::UndefSection)

return {};

clayborgUnsubmitted

Done

Check for a Address for a valid section index:

if (Address.SectionIndex != llvm::object::SectionedAddress::UndefSection)
  return {};

clayborg: Check for a Address for a valid section index: ``` if (Address.SectionIndex != llvm::object…

auto ResultOrErr = Reader->lookup(Address.Address);

clayborgUnsubmitted

Done

auto ResultOrErr = Reader->lookup(Address.Address);

- if (!ResultOrErr)

+ if (!ResultOrErr) {

+ consumeError(ResultOrErr.takeError());

return {};

+ }

const auto &Result = *ResultOrErr;

You need to consume the error, or this will crash.

clayborg: You need to consume the error, or this will crash.

if (!ResultOrErr) {

consumeError(ResultOrErr.takeError());

return {};

}

const auto &Result = *ResultOrErr;

clayborgUnsubmitted

Not Done

So the "Result.FuncName" is the name of the concrete function or symbol if there was no debug info for a symbol in the symbol table. So you only want to use this if "Result.Locations" is empty. If "Result.locations" is not empty, you don't want to use that because the "Result.Locations.front()" might point to a inlined function. The name in the SourceLocation will be the inlined function name.

The main question is what other symbolizers do in this case. Lets say we are in the concrete function "main" and the "Result.Locations.front()" points to something line "std::vector<int>::empty()" at "/.../vector:123", do other symbolizers create a name the includes the inlined function + the concrete function? I would find an address in a DWARF file that points to an inlined function and symbolize it using the DWARF and see what gets output for the DWARF.

The other issue here is that GSYM can return multiple locations for a single address since GSYM will unwind the inline call stack. The LookupResult contains an array of locations. Do we not want to convey this back? Or does the LLVM symbolizer always want the deepest inline function for a given address?

But seeing as below we get the inlining information in GsymDIContext::getInliningInfoForAddress(...), maybe we should always be returning the Result.FuncName? It really depends on what the other symbolizers do. I would make a small DWARF file, convert it to GSYM, then do lookups using the GSYM and the DWARF and making sure the DI classes match.

clayborg: So the "Result.FuncName" is the name of the concrete function or symbol if there was no debug…

simon.gieseckeAuthorUnsubmitted

Done

The other issue here is that GSYM can return multiple locations for a single address since GSYM will unwind the inline call stack. The LookupResult contains an array of locations. Do we not want to convey this back? Or does the LLVM symbolizer always want the deepest inline function for a given address?

Well, there's getInliningInfoForAddress which returns all inline locations, which is used when the -inlines option is set.

When it is not set, I implemented the same behaviour here as the DWARF implementation as checked by sym.test. I don't know what's the rationale for that.

I would make a small DWARF file, convert it to GSYM, then do lookups using the GSYM and the DWARF and making sure the DI classes match.

That's basically what I did. I took the existing addr.exe from the sym.test test case, converted that to GSYM and stripped DWARF from the object file, and ensured that the test cases create the same output, which they do, except for the missing column information. This covers the inline case. Not sure if there are any tests that check consistency of the DWARF vs. PDB behaviour here.

simon.giesecke: > The other issue here is that GSYM can return multiple locations for a single address since…

DILineInfo LineInfo;

if (Result.Locations.empty()) {

// No debug info for this, we just had a symbol from the symbol table.

clayborgUnsubmitted

Done

// make a difference?

- if (!Result.Locations.empty()) {

+ if (Result.Locations.empty()) {

+ // No debug info for this, we just had a symbol from the symbol table.

+ LineInfo.FunctionName = Result.FuncName.str();

+ } else {

if (!fillLineInfoFromLocation(Result.Locations.front(), Specifier,

LineInfo))

return {};

}

LineInfo.StartAddress = Result.FuncRange.Start;

clayborg:

// FIXME Demangle in case of DINameKind::ShortName

if (Specifier.FNKind != DINameKind::None) {

LineInfo.FunctionName = Result.FuncName.str();

}

} else {

if (!fillLineInfoFromLocation(Result.Locations.front(), Specifier,

LineInfo))

return {};

}

LineInfo.StartAddress = Result.FuncRange.Start;

return LineInfo;

}

DILineInfoTable

clayborgUnsubmitted

Done

If this isn't used, is there a reason we are converting it?

clayborg: If this isn't used, is there a reason we are converting it?

simon.gieseckeAuthorUnsubmitted

Done

Well, this is an implementation of the DIContext interface, and ideally we would have a full implementation of the interface (or we should at least have a way to indicate that we only have a partial implementation?). The function is called in the context of JIT, by several event listeners and from llvm-rtdlyd right now. I wonder if GSYM support is required there, but if only DWARF is supported there anyway, why does that use the format-neutral DIContext interface?

I didn't intend to implement this as part of this patch in any case.

simon.giesecke: Well, this is an implementation of the `DIContext` interface, and ideally we would have a full…

GsymDIContext::getLineInfoForAddressRange(object::SectionedAddress Address,

uint64_t Size,

clayborgUnsubmitted

Done

If you wanted to do this you would get the full gsym::FunctionInfo from the GsymReader and convert any addresses that fall into this range.

if (Address.SectionIndex != llvm::object::SectionedAddress::UndefSection)
  return DILineInfoTable();

if (auto LineTableOrErr = Reader->getFunctionInfo(Address.Address)) {
  // Iterate over line table entries and take the ones that fall in the range.
} else {
  consumeError(LineTableOrErr.takeError());
  return DILineInfoTable();
}

clayborg: If you wanted to do this you would get the full gsym::FunctionInfo from the GsymReader and…

clayborgUnsubmitted

Done

as long as we are doing this layer, we might as well fill this in. The code above is a good start. Something like:

if (Address.SectionIndex != llvm::object::SectionedAddress::UndefSection)
  return DILineInfoTable();

if (auto FuncInfoOrErr = Reader->getFunctionInfo(Address.Address)) {
  if (FuncInfoOrErr->OptLineTable) {
    const gsym::LineTable &LT = *FuncInfoOrErr->OptLineTable;
    const uint64_t StartAddr = Address.Address;
    const uint64_t EndAddr = Address.Address + Size;
    for (const auto &gsym::LineEntry : LT) {
      if (StartAddr <= LineEntry.Addr && LineEntry.Addr < EndAddr) {
        // Use LineEntry.Addr, LineEntry.File (which is a file index into the 
        // files tables from the GsymReader), and LineEntry.Line (source line
        // number) to add stuff to the DILineInfoTable
      }
    }
  }
} else {
  consumeError(LineTableOrErr.takeError());
  return DILineInfoTable();
}

clayborg: as long as we are doing this layer, we might as well fill this in. The code above is a good…

simon.gieseckeAuthorUnsubmitted

Done

Ok.

simon.giesecke: Ok.

DILineInfoSpecifier Specifier) {

if (Size == 0)

return DILineInfoTable();

DILineInfoTable Table;

// FIXME Implement this as well. It's not used by Symbolize though.

// auto LineNumbers = Session->findLineNumbersByAddress(Address.Address,

// Size); if (!LineNumbers || LineNumbers->getChildCount() == 0)

// return Table;

// while (auto LineInfo = LineNumbers->getNext()) {

// DILineInfo LineEntry = getLineInfoForAddress(

// {LineInfo->getVirtualAddress(), Address.SectionIndex}, Specifier);

// Table.push_back(std::make_pair(LineInfo->getVirtualAddress(),

// LineEntry));

// }

return Table;

}

DIInliningInfo

GsymDIContext::getInliningInfoForAddress(object::SectionedAddress Address,

DILineInfoSpecifier Specifier) {

auto ResultOrErr = Reader->lookup(Address.Address);

if (!ResultOrErr)

return {};

const auto &Result = *ResultOrErr;

DIInliningInfo InlineInfo;

for (const auto &Location : Result.Locations) {

DILineInfo LineInfo;

if (!fillLineInfoFromLocation(Location, Specifier, LineInfo))

return {};

// Hm, that's probably something that should only be filled in the first or

// last frame?

LineInfo.StartAddress = Result.FuncRange.Start;

InlineInfo.addFrame(LineInfo);

}

return InlineInfo;

}

std::vector<DILocal>

GsymDIContext::getLocalsForAddress(object::SectionedAddress Address) {

// We can't implement this, there's no such information in the GSYM file.

return std::vector<DILocal>();

}

llvm/lib/DebugInfo/Symbolize/CMakeLists.txt

	add_llvm_component_library(LLVMSymbolize			add_llvm_component_library(LLVMSymbolize
	DIPrinter.cpp			DIPrinter.cpp
	SymbolizableObjectFile.cpp			SymbolizableObjectFile.cpp
	Symbolize.cpp			Symbolize.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${LLVM_MAIN_INCLUDE_DIR}/llvm/DebugInfo/Symbolize			${LLVM_MAIN_INCLUDE_DIR}/llvm/DebugInfo/Symbolize

	LINK_COMPONENTS			LINK_COMPONENTS
	DebugInfoDWARF			DebugInfoDWARF
				DebugInfoGSYM
	DebugInfoPDB			DebugInfoPDB
	Object			Object
	Support			Support
	Demangle			Demangle
	)			)

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

Show All 12 Lines
#include "llvm/DebugInfo/Symbolize/Symbolize.h"		#include "llvm/DebugInfo/Symbolize/Symbolize.h"

#include "SymbolizableObjectFile.h"		#include "SymbolizableObjectFile.h"

#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/BinaryFormat/COFF.h"		#include "llvm/BinaryFormat/COFF.h"
#include "llvm/Config/config.h"		#include "llvm/Config/config.h"
#include "llvm/DebugInfo/DWARF/DWARFContext.h"		#include "llvm/DebugInfo/DWARF/DWARFContext.h"
		#include "llvm/DebugInfo/GSYM/GsymDIContext.h"
		#include "llvm/DebugInfo/GSYM/GsymReader.h"
#include "llvm/DebugInfo/PDB/PDB.h"		#include "llvm/DebugInfo/PDB/PDB.h"
#include "llvm/DebugInfo/PDB/PDBContext.h"		#include "llvm/DebugInfo/PDB/PDBContext.h"
#include "llvm/Demangle/Demangle.h"		#include "llvm/Demangle/Demangle.h"
#include "llvm/Object/COFF.h"		#include "llvm/Object/COFF.h"
#include "llvm/Object/MachO.h"		#include "llvm/Object/MachO.h"
#include "llvm/Object/MachOUniversal.h"		#include "llvm/Object/MachOUniversal.h"
#include "llvm/Support/CRC.h"		#include "llvm/Support/CRC.h"
#include "llvm/Support/Casting.h"		#include "llvm/Support/Casting.h"
▲ Show 20 Lines • Show All 429 Lines • ▼ Show 20 Lines	ObjectFile *LLVMSymbolizer::lookUpBuildIDObject(const std::string &Path,
auto DbgObjOrErr = getOrCreateObject(DebugBinaryPath, ArchName);		auto DbgObjOrErr = getOrCreateObject(DebugBinaryPath, ArchName);
if (!DbgObjOrErr) {		if (!DbgObjOrErr) {
consumeError(DbgObjOrErr.takeError());		consumeError(DbgObjOrErr.takeError());
return nullptr;		return nullptr;
}		}
return DbgObjOrErr.get();		return DbgObjOrErr.get();
}		}

		std::string LLVMSymbolizer::lookUpGsymFile(const std::string &Path) {
		// TODO We should provide an option to provide an alternative directory for
		// GSYM files.

		const auto GsymPath = Path + ".gsym";

		sys::fs::file_status Status;
		if (std::error_code EC = llvm::sys::fs::status(GsymPath, Status))
		return {};

		if (llvm::sys::fs::is_directory(Status))
		return {};

		// TODO Also check if GSYM file is up-to-date?

		return GsymPath;
		}

Expected<LLVMSymbolizer::ObjectPair>		Expected<LLVMSymbolizer::ObjectPair>
LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,		LLVMSymbolizer::getOrCreateObjectPair(const std::string &Path,
const std::string &ArchName) {		const std::string &ArchName) {
auto I = ObjectPairForPathArch.find(std::make_pair(Path, ArchName));		auto I = ObjectPairForPathArch.find(std::make_pair(Path, ArchName));
if (I != ObjectPairForPathArch.end())		if (I != ObjectPairForPathArch.end())
return I->second;		return I->second;

auto ObjOrErr = getOrCreateObject(Path, ArchName);		auto ObjOrErr = getOrCreateObject(Path, ArchName);
▲ Show 20 Lines • Show All 99 Lines • ▼ Show 20 Lines	LLVMSymbolizer::getOrCreateModuleInfo(const std::string &ModuleName) {
if (!ObjectsOrErr) {		if (!ObjectsOrErr) {
// Failed to find valid object file.		// Failed to find valid object file.
Modules.emplace(ModuleName, std::unique_ptr<SymbolizableModule>());		Modules.emplace(ModuleName, std::unique_ptr<SymbolizableModule>());
return ObjectsOrErr.takeError();		return ObjectsOrErr.takeError();
}		}
ObjectPair Objects = ObjectsOrErr.get();		ObjectPair Objects = ObjectsOrErr.get();

std::unique_ptr<DIContext> Context;		std::unique_ptr<DIContext> Context;
// If this is a COFF object containing PDB info, use a PDBContext to		// TODO There should be an option that disables the preference for GSYM.
// symbolize. Otherwise, use DWARF.
		// Create a DIContext to symbolize as follows:
		// - If there is a GSYM file, create a GsymDIContext.
		// - Otherwise, if this is a COFF object containing PDB info, create a
		// PDBContext.
		// - Otherwise, create a DWARFContext.
		const auto GsymFile = lookUpGsymFile(BinaryName);
		if (!GsymFile.empty()) {
		auto ReaderOrErr = gsym::GsymReader::openFile(GsymFile);

		if (ReaderOrErr) {
		std::unique_ptr<gsym::GsymReader> Reader =
		std::make_unique<gsym::GsymReader>(std::move(*ReaderOrErr));

		Context = std::make_unique<gsym::GsymDIContext>(std::move(Reader));
		}
		}
		if (!Context) {
if (auto CoffObject = dyn_cast<COFFObjectFile>(Objects.first)) {		if (auto CoffObject = dyn_cast<COFFObjectFile>(Objects.first)) {
		Lint: Pre-merge checks Inline Actions clang-tidy: warning: 'auto CoffObject' can be declared as 'const auto CoffObject' [llvm-qualified-auto] not useful Lint: Pre-merge checks:* clang-tidy: warning: 'auto CoffObject' can be declared as 'const auto *CoffObject' [llvm…
const codeview::DebugInfo *DebugInfo;		const codeview::DebugInfo *DebugInfo;
StringRef PDBFileName;		StringRef PDBFileName;
auto EC = CoffObject->getDebugPDBInfo(DebugInfo, PDBFileName);		auto EC = CoffObject->getDebugPDBInfo(DebugInfo, PDBFileName);
if (!EC && DebugInfo != nullptr && !PDBFileName.empty()) {		if (!EC && DebugInfo != nullptr && !PDBFileName.empty()) {
using namespace pdb;		using namespace pdb;
std::unique_ptr<IPDBSession> Session;		std::unique_ptr<IPDBSession> Session;

PDB_ReaderType ReaderType =		PDB_ReaderType ReaderType =
Opts.UseDIA ? PDB_ReaderType::DIA : PDB_ReaderType::Native;		Opts.UseDIA ? PDB_ReaderType::DIA : PDB_ReaderType::Native;
if (auto Err = loadDataForEXE(ReaderType, Objects.first->getFileName(),		if (auto Err = loadDataForEXE(ReaderType, Objects.first->getFileName(),
Session)) {		Session)) {
Modules.emplace(ModuleName, std::unique_ptr<SymbolizableModule>());		Modules.emplace(ModuleName, std::unique_ptr<SymbolizableModule>());
// Return along the PDB filename to provide more context		// Return along the PDB filename to provide more context
return createFileError(PDBFileName, std::move(Err));		return createFileError(PDBFileName, std::move(Err));
}		}
Context.reset(new PDBContext(*CoffObject, std::move(Session)));		Context.reset(new PDBContext(*CoffObject, std::move(Session)));
}		}
}		}
		}
if (!Context)		if (!Context)
Context = DWARFContext::create(*Objects.second, nullptr, Opts.DWPName);		Context = DWARFContext::create(*Objects.second, nullptr, Opts.DWPName);
return createModuleInfo(Objects.first, std::move(Context), ModuleName);		return createModuleInfo(Objects.first, std::move(Context), ModuleName);
}		}

Expected<SymbolizableModule *>		Expected<SymbolizableModule *>
LLVMSymbolizer::getOrCreateModuleInfo(const ObjectFile &Obj) {		LLVMSymbolizer::getOrCreateModuleInfo(const ObjectFile &Obj) {
StringRef ObjName = Obj.getFileName();		StringRef ObjName = Obj.getFileName();
▲ Show 20 Lines • Show All 77 Lines • Show Last 20 Lines

llvm/test/tools/llvm-symbolizer/Inputs/addr-gsymonly.exe

llvm/test/tools/llvm-symbolizer/Inputs/addr-gsymonly.exe.gsym

llvm/test/tools/llvm-symbolizer/sym-gsymonly.test

This file was added.

				#Source:
				##include <stdio.h>
				jhendersonUnsubmitted Done Reply Inline Actions Probably worth a top-level comment explaining what this test is supposed to be testing. Also probably you can just rename this test "gsym.test". I'm not sure what the "only" bit is about, and the "sym-" prefix similarly doesn't look to add any meaning. jhenderson: Probably worth a top-level comment explaining what this test is supposed to be testing. Also…
				simon.gieseckeAuthorUnsubmitted Done Reply Inline Actions Probably worth a top-level comment explaining what this test is supposed to be testing. Definitely, I'll add that. Also probably you can just rename this test "gsym.test". I'm not sure what the "only" bit is about, and the "sym-" prefix similarly doesn't look to add any meaning. "gsymonly" refers to the fact that the binary doesn't have DWARF debug info but only a corresponding GSYM file. Another case would be that we have both a GSYM file and DWARF debug info. This should probably be tested as well. simon.giesecke: > Probably worth a top-level comment explaining what this test is supposed to be testing.
				#static inline int inctwo (int *a) {
				# printf ("%d\n",(*a)++);
				jhendersonUnsubmitted Not Done Reply Inline Actions I'd avoid referencing other tests as part of this comment: sym.test has been on my wishlist as something I'd love to rewrite if I had the time, due to it's usage of precanned binaries, conflation of testing of multiple unrelated options and general vagueness of the test intent. Ideally, we'd avoid mirroring it entirely, in favour of testing the things that need to be tested specifically for this format. jhenderson: I'd avoid referencing other tests as part of this comment: sym.test has been on my wishlist as…
				# return (*a)++;
				#}
				#static inline int inc (int *a) {
				# printf ("%d\n",inctwo(a));
				# return (*a)++;
				#}
				#
				#
				#int main () {
				# int x = 1;
				# return inc(&x);
				#}
				#
				#Build as : clang -g -O2 addr.c

				RUN: llvm-symbolizer -print-address -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck %s
				RUN: llvm-symbolizer -addresses -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck %s
				jhendersonUnsubmitted Not Done Reply Inline Actions Nit: I'd normalise the comment markers throughout this test, using the following two rules: True comments start with `##` (followed by a space). Applies for the source above. This helps distinguish the test details from actual comments. Add `#` as a comment marker to all RUN lines, much as we do in the vast majority of newer tests. Follow it with a space again. CHECK and equivalent lines should be `# CHECK` (with space). Finally, as noted out-of-line, I've not looked at the implementation, but do you need all these individual test-cases for gsym testing? Most of them look more like they're testing generic symbolizer behaviour, which is therefore already covered elsewhere. jhenderson: Nit: I'd normalise the comment markers throughout this test, using the following two rules: 1)…
				simon.gieseckeAuthorUnsubmitted Done Reply Inline Actions As I mentioned above, this is indeed a copy of sym.test, using a different input binary. The results should be the same, except for the fact that we don't get any column information from GSYM. I checked the list of test cases again, and arguably some tests seem to only test the formatting of the output, and this is somehow orthogonal to the data source used for symbolication. But at least some are passed into the `DIContext` implementation (inlining, basename, ...), and I am not sure if we should make assumptions on the implementation detail of which command line options interact with the `DIContext` implementation at this level. If you think specific test cases should be removed here, please suggest them. simon.giesecke: As I mentioned above, this is indeed a copy of sym.test, using a different input binary. The…
				jhendersonUnsubmitted Not Done Reply Inline Actions I'm a bit constrained on time at the moment, so can't really dig into the source to look at what makes sense to test. Perhaps @clayborg has some suggestions there. I think it's reasonable to make assumptions based on the current implementation, about what is orthoganol and what isn't. If people refactor the area, to change where information is retrieved, it's probably reasonable to expect them to ensure test coverage is still sufficient, but I don't really know. You certainly should be able to drop the llvm-addr2line cases: llvm-addr2line is basically llvm-symbolizer with some different defaults on the options, and one or two minor formatting differences. The underlying source of information is irrelevant. Similarly, you can drop testing that different aliases mean the same thing, as these are tested elsewhere. jhenderson: I'm a bit constrained on time at the moment, so can't really dig into the source to look at…
				RUN: llvm-symbolizer -a -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck %s
				RUN: llvm-symbolizer -inlining -print-address -pretty-print -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefix="PRETTY" %s
				RUN: llvm-symbolizer -inlining -print-address -p -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefix="PRETTY" %s
				RUN: llvm-symbolizer -inlines -print-address -pretty-print -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefix="PRETTY" %s
				RUN: llvm-symbolizer -inlines -print-address -p -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefix="PRETTY" %s
				RUN: llvm-symbolizer -i -print-address -pretty-print -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefix="PRETTY" %s
				RUN: llvm-symbolizer -i -print-address -p -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefix="PRETTY" %s
				## Before 2020-08-04, asan_symbolize.py passed --inlining=true.
				## Support this compatibility alias for a while.
				RUN: llvm-symbolizer --inlining=true --print-address -p --obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefix="PRETTY" %s

				RUN: echo "0x1" > %t.input
				RUN: llvm-symbolizer -obj=%p/Inputs/zero < %t.input \| FileCheck -check-prefix="ZERO" %s

				RUN: llvm-addr2line -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefix=A2L %s
				RUN: llvm-addr2line -a -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2L,A2L_A %s
				RUN: llvm-addr2line -f -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2L,A2L_F %s
				RUN: llvm-addr2line -i -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2L,A2L_I %s
				RUN: llvm-addr2line -fi -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2L,A2L_F,A2L_I,A2L_FI %s

				RUN: llvm-addr2line -pa -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2LP,A2LP_A %s
				RUN: llvm-addr2line -pf -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2LP,A2LP_F %s
				RUN: llvm-addr2line -paf -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2LP,A2LP_AF %s
				RUN: llvm-addr2line -pai -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2LP,A2LP_A,A2LP_I %s
				RUN: llvm-addr2line -pfi -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2LP,A2LP_F,A2LP_FI %s
				RUN: llvm-addr2line -pafi -obj=%p/Inputs/addr-gsymonly.exe < %p/Inputs/addr.inp \| FileCheck -check-prefixes=A2LP,A2LP_AF,A2LP_FI %s

				# CHECK: some text
				# CHECK-NEXT: 0x40054d
				# CHECK-NEXT: inctwo
				# CHECK-NEXT: {{[/\]+}}tmp{{[/\]+}}x.c:3:0
				# CHECK-NEXT: inc
				# CHECK-NEXT: {{[/\]+}}tmp{{[/\]+}}x.c:7:0
				# CHECK-NEXT: main
				# CHECK-NEXT: {{[/\]+}}tmp{{[/\]+}}x.c:14:0
				# CHECK-EMPTY:
				# CHECK-NEXT: some text2
				#
				#PRETTY: some text
				#PRETTY: {{[0x]+}}40054d: inctwo at {{[/\]+}}tmp{{[/\]+}}x.c:3:0
				#PRETTY: (inlined by) inc at {{[/\]+}}tmp{{[/\]+}}x.c:7:0
				#PRETTY: (inlined by) main at {{[/\]+}}tmp{{[/\]+}}x.c:14:0
				#PRETTY: some text2
				#
				#ZERO: ??
				#ZERO: ??:0:0
				#
				#A2L: some text
				#A2L_A-NEXT: 0x40054d
				#A2L_F-NEXT: inctwo
				#A2L-NEXT: {{[/\]+}}tmp{{[/\]+}}x.c:3{{$}}
				#A2L_FI-NEXT: inc{{$}}
				#A2L_I-NEXT: {{[/\]+}}tmp{{[/\]+}}x.c:7{{$}}
				#A2L_FI-NEXT: main
				#A2L_I-NEXT: {{[/\]+}}tmp{{[/\]+}}x.c:14{{$}}
				#A2L-NEXT: some text2

				#A2LP: some text
				#A2LP_A-NEXT: 0x40054d: {{[/\]+}}tmp{{[/\]+}}x.c:3{{$}}
				#A2LP_F-NEXT: inctwo at {{[/\]+}}tmp{{[/\]+}}x.c:3{{$}}
				#A2LP_AF-NEXT: 0x40054d: inctwo at {{[/\]+}}tmp{{[/\]+}}x.c:3{{$}}
				#A2LP_I-NEXT: {{[/\]+}}tmp{{[/\]+}}x.c:7{{$}}
				#A2LP_I-NEXT: {{[/\]+}}tmp{{[/\]+}}x.c:14{{$}}
				#A2LP_FI-NEXT: (inlined by) inc at {{[/\]+}}tmp{{[/\]+}}x.c:7{{$}}
				#A2LP_FI-NEXT: (inlined by) main at {{[/\]+}}tmp{{[/\]+}}x.c:14{{$}}
				#A2LP-NEXT: some text2

llvm/utils/gn/secondary/llvm/lib/DebugInfo/GSYM/BUILD.gn

	static_library("GSYM") {			static_library("GSYM") {
	output_name = "LLVMDebugInfoGSYM"			output_name = "LLVMDebugInfoGSYM"
	deps = [			deps = [
	"//llvm/lib/MC",			"//llvm/lib/MC",
	"//llvm/lib/Support",			"//llvm/lib/Support",
	]			]
	sources = [			sources = [
	"DwarfTransformer.cpp",			"DwarfTransformer.cpp",
	"FileWriter.cpp",			"FileWriter.cpp",
	"FunctionInfo.cpp",			"FunctionInfo.cpp",
	"GsymCreator.cpp",			"GsymCreator.cpp",
				"GsymDIContext.cpp",
	"GsymReader.cpp",			"GsymReader.cpp",
	"Header.cpp",			"Header.cpp",
	"InlineInfo.cpp",			"InlineInfo.cpp",
	"LineTable.cpp",			"LineTable.cpp",
	"LookupResult.cpp",			"LookupResult.cpp",
	"ObjectFileTransformer.cpp",			"ObjectFileTransformer.cpp",
	"Range.cpp",			"Range.cpp",
	]			]
	}			}

This is an archive of the discontinued LLVM Phabricator instance.

Support GSYM in llvm-symbolizer.
Needs ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 358875

llvm/include/llvm/DebugInfo/DIContext.h

llvm/include/llvm/DebugInfo/GSYM/GsymDIContext.h

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

llvm/lib/DebugInfo/GSYM/CMakeLists.txt

llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp

llvm/lib/DebugInfo/Symbolize/CMakeLists.txt

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

llvm/test/tools/llvm-symbolizer/Inputs/addr-gsymonly.exe

llvm/test/tools/llvm-symbolizer/Inputs/addr-gsymonly.exe.gsym

llvm/test/tools/llvm-symbolizer/sym-gsymonly.test

llvm/utils/gn/secondary/llvm/lib/DebugInfo/GSYM/BUILD.gn

Unhandled Exception ("Exception")

Unhandled Exception ("Exception")

This is an archive of the discontinued LLVM Phabricator instance.

Support GSYM in llvm-symbolizer.Needs ReviewPublic

Details

Diff Detail

Unit TestsFailed

Event Timeline

Revision Contents

Diff 358875

llvm/include/llvm/DebugInfo/DIContext.h

llvm/include/llvm/DebugInfo/GSYM/GsymDIContext.h

llvm/include/llvm/DebugInfo/Symbolize/Symbolize.h

llvm/lib/DebugInfo/GSYM/CMakeLists.txt

llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp

llvm/lib/DebugInfo/Symbolize/CMakeLists.txt

llvm/lib/DebugInfo/Symbolize/Symbolize.cpp

llvm/test/tools/llvm-symbolizer/Inputs/addr-gsymonly.exe

llvm/test/tools/llvm-symbolizer/Inputs/addr-gsymonly.exe.gsym

llvm/test/tools/llvm-symbolizer/sym-gsymonly.test

llvm/utils/gn/secondary/llvm/lib/DebugInfo/GSYM/BUILD.gn

Support GSYM in llvm-symbolizer.
Needs ReviewPublic