Added GSYM-only tests.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
There are several TODO markers in the code. Please provide some feedback on those.
Resolving duplication between getInliningInfoForAddress and getLineInfoForAddress is something I am currently taking care of.
Note that sym-gsymonly.test is mostly a copy of sym.test & using the different input file. The only other difference is that I removed the column number for main, which we don't have available from the GSYM file, IIUC.
llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp | ||
---|---|---|
26 | If Specifier.FNKind is set to DINameKind::ShortName, we want to demangle the name into LineInfo.FunctionName. If it is set to DINameKind::LinkageName or DINameKind::None, set to to the current name. Demangling can be tricky, as there are many name manglings. Not sure if there is a one stop shop to demangle all sorts of names in LLVM. | |
30–32 | We should watch out for an empty Location.Dir and also for an empty Location.Base. | |
37 | Move DILineInfoSpecifier::FileLineInfoKind::RawValue up to the DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath case. RawValue is described in comments as: // RawValue is whatever the compiler stored in the filename table. Could be // a full path, could be something else. So we should emit this as a full path of what was found in the GSYM | |
43–44 | The header file describes DILineInfoSpecifier::FileLineInfoKind::RelativeFilePath using: // Relative to the compilation directory. We don't have the compilation directory here, so I would just emit a full path, so move "case DILineInfoSpecifier::FileLineInfoKind::RelativeFilePath:" up to the case for "DILineInfoSpecifier::FileLineInfoKind::AbsoluteFilePath". | |
50 | We don't have the source. I would avoid attempting to read it from disk as well because you might be symbolicating something from another machine or architecture. | |
53 | Yes, set to zero | |
55–57 | yeah, we don't have this info | |
59 | We can. If it is set to DINameKind::ShortName, we want to demangle the name into LineInfo.FunctionName. If it is set to DINameKind::LinkageName or DINameKind::None, set to to the current name. | |
67 | Check for a Address for a valid section index: if (Address.SectionIndex != llvm::object::SectionedAddress::UndefSection) return {}; | |
69–70 | You need to consume the error, or this will crash. | |
76–77 | So the "Result.FuncName" is the name of the concrete function or symbol if there was no debug info for a symbol in the symbol table. So you only want to use this if "Result.Locations" is empty. If "Result.locations" is not empty, you don't want to use that because the "Result.Locations.front()" might point to a inlined function. The name in the SourceLocation will be the inlined function name. The main question is what other symbolizers do in this case. Lets say we are in the concrete function "main" and the "Result.Locations.front()" points to something line "std::vector<int>::empty()" at "/.../vector:123", do other symbolizers create a name the includes the inlined function + the concrete function? I would find an address in a DWARF file that points to an inlined function and symbolize it using the DWARF and see what gets output for the DWARF. The other issue here is that GSYM can return multiple locations for a single address since GSYM will unwind the inline call stack. The LookupResult contains an array of locations. Do we not want to convey this back? Or does the LLVM symbolizer always want the deepest inline function for a given address? But seeing as below we get the inlining information in GsymDIContext::getInliningInfoForAddress(...), maybe we should always be returning the Result.FuncName? It really depends on what the other symbolizers do. I would make a small DWARF file, convert it to GSYM, then do lookups using the GSYM and the DWARF and making sure the DI classes match. | |
78–82 | ||
98 | If this isn't used, is there a reason we are converting it? | |
100 | If you wanted to do this you would get the full gsym::FunctionInfo from the GsymReader and convert any addresses that fall into this range. if (Address.SectionIndex != llvm::object::SectionedAddress::UndefSection) return DILineInfoTable(); if (auto LineTableOrErr = Reader->getFunctionInfo(Address.Address)) { // Iterate over line table entries and take the ones that fall in the range. } else { consumeError(LineTableOrErr.takeError()); return DILineInfoTable(); } |
I have zero experience with or knowledge about gsym, so I can't comment on the implementation details, but some higher-level comments:
- Make sure to review the llvm-symbolizer documentation to see if there's anything there that needs adding/updating given this change (there may not be).
- Can you avoid using pre-canned binaries for the test input, by somehow generating the inputs on the fly? The existing pre-canned binaries in the llvm-symbolizer tests are already sub-optimal, in my opinion, as they are opaque and make testing less flexible.
llvm/test/tools/llvm-symbolizer/sym-gsymonly.test | ||
---|---|---|
2 | Probably worth a top-level comment explaining what this test is supposed to be testing. Also probably you can just rename this test "gsym.test". I'm not sure what the "only" bit is about, and the "sym-" prefix similarly doesn't look to add any meaning. | |
21 | Nit: I'd normalise the comment markers throughout this test, using the following two rules:
Finally, as noted out-of-line, I've not looked at the implementation, but do you need all these individual test-cases for gsym testing? Most of them look more like they're testing generic symbolizer behaviour, which is therefore already covered elsewhere. |
Sure, I'll check that.
- Can you avoid using pre-canned binaries for the test input, by somehow generating the inputs on the fly? The existing pre-canned binaries in the llvm-symbolizer tests are already sub-optimal, in my opinion, as they are opaque and make testing less flexible.
The sym-gsymonly test is mostly a copy of the sym test, and the binary is just the stripped binary. These additional files don't need to be part of the repository, we could run llvm-gsymutil --convert and strip DWARF from it as part of the test.
If this should be migrated to remove the use of the canned binary entirely, I fear I am not knowledgable enough on the test infrastructure to do that.
This should probably have been stated in the test file.
llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp | ||
---|---|---|
26 | Hm, what do you suggest to do here then? I think I am not knowledgeable enough to implement the demangling without further guidance. Can we leave that to future work? In that case, should we fail here or fall back to not demangling, for now? FWIW, I found that PDBContext returns the empty string in case of DINameKind::None. | |
76–77 |
Well, there's getInliningInfoForAddress which returns all inline locations, which is used when the -inlines option is set. When it is not set, I implemented the same behaviour here as the DWARF implementation as checked by sym.test. I don't know what's the rationale for that.
That's basically what I did. I took the existing addr.exe from the sym.test test case, converted that to GSYM and stripped DWARF from the object file, and ensured that the test cases create the same output, which they do, except for the missing column information. This covers the inline case. Not sure if there are any tests that check consistency of the DWARF vs. PDB behaviour here. | |
98 | Well, this is an implementation of the DIContext interface, and ideally we would have a full implementation of the interface (or we should at least have a way to indicate that we only have a partial implementation?). The function is called in the context of JIT, by several event listeners and from llvm-rtdlyd right now. I wonder if GSYM support is required there, but if only DWARF is supported there anyway, why does that use the format-neutral DIContext interface? I didn't intend to implement this as part of this patch in any case. | |
llvm/test/tools/llvm-symbolizer/sym-gsymonly.test | ||
2 |
Definitely, I'll add that.
"gsymonly" refers to the fact that the binary doesn't have DWARF debug info but only a corresponding GSYM file. Another case would be that we have both a GSYM file and DWARF debug info. This should probably be tested as well. | |
21 | As I mentioned above, this is indeed a copy of sym.test, using a different input binary. The results should be the same, except for the fact that we don't get any column information from GSYM. I checked the list of test cases again, and arguably some tests seem to only test the formatting of the output, and this is somehow orthogonal to the data source used for symbolication. But at least some are passed into the DIContext implementation (inlining, basename, ...), and I am not sure if we should make assumptions on the implementation detail of which command line options interact with the DIContext implementation at this level. If you think specific test cases should be removed here, please suggest them. |
llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp | ||
---|---|---|
28 | ||
100–101 | as long as we are doing this layer, we might as well fill this in. The code above is a good start. Something like: if (Address.SectionIndex != llvm::object::SectionedAddress::UndefSection) return DILineInfoTable(); if (auto FuncInfoOrErr = Reader->getFunctionInfo(Address.Address)) { if (FuncInfoOrErr->OptLineTable) { const gsym::LineTable < = *FuncInfoOrErr->OptLineTable; const uint64_t StartAddr = Address.Address; const uint64_t EndAddr = Address.Address + Size; for (const auto &gsym::LineEntry : LT) { if (StartAddr <= LineEntry.Addr && LineEntry.Addr < EndAddr) { // Use LineEntry.Addr, LineEntry.File (which is a file index into the // files tables from the GsymReader), and LineEntry.Line (source line // number) to add stuff to the DILineInfoTable } } } } else { consumeError(LineTableOrErr.takeError()); return DILineInfoTable(); } |
llvm/lib/DebugInfo/GSYM/GsymDIContext.cpp | ||
---|---|---|
100–101 | Ok. |
@clayborg Could you comment on the TODOs referring to llvm-symbolizer command line options:
// TODO We should provide an option to provide an alternative directory for// GSYM files.
and
// TODO There should be an option that disables the preference for GSYM.
Does this make sense? Then I'll add these options.
I checked llvm/docs/CommandGuide/llvm-symbolizer.rst, and it doesn't say anything right now about how the data source is selected. Maybe it should, and I can add something, but then this should probably be more than just saying that GSYM is preferred when it's present. I think there's similar logic for .dwp files?
I don't think we need to go into that in this change. I guess the point of the llvm-symbolizer documentation is more about how to use the tool, not what does the tool do under-the-hood, but I thought it was worth checking.
I don't know how gsym data is actually generated, so some of these suggestions may not make much sense.
Option 1) Add the test to the new cross-project-tests test directory. This was created for tests that rely on clang/lld or whatever to generate valid test inputs from source, without the need for a canned binary. You'd include the source and build it directly at run-time. This would only work though if your test addresses are unlikely to change as changes occur in the compiler/linker. llvm-symbolizer was one of the primary motivations for this actually.
Option 2) Check in the assembly required to build the input, and use llvm-mc and other tools (llvm-strip/llvm-gsymutil etc) to turn it into the desired output at runtime.
Option 3) Generate the gsym data directly at run-time somehow.
llvm/test/tools/llvm-symbolizer/sym-gsymonly.test | ||
---|---|---|
2–4 | I'd avoid referencing other tests as part of this comment: sym.test has been on my wishlist as something I'd love to rewrite if I had the time, due to it's usage of precanned binaries, conflation of testing of multiple unrelated options and general vagueness of the test intent. Ideally, we'd avoid mirroring it entirely, in favour of testing the things that need to be tested specifically for this format. | |
21 | I'm a bit constrained on time at the moment, so can't really dig into the source to look at what makes sense to test. Perhaps @clayborg has some suggestions there. I think it's reasonable to make assumptions based on the current implementation, about what is orthoganol and what isn't. If people refactor the area, to change where information is retrieved, it's probably reasonable to expect them to ensure test coverage is still sufficient, but I don't really know. You certainly should be able to drop the llvm-addr2line cases: llvm-addr2line is basically llvm-symbolizer with some different defaults on the options, and one or two minor formatting differences. The underlying source of information is irrelevant. Similarly, you can drop testing that different aliases mean the same thing, as these are tested elsewhere. |
Ok.
I don't know how gsym data is actually generated, so some of these suggestions may not make much sense.
Option 1) Add the test to the new cross-project-tests test directory. This was created for tests that rely on clang/lld or whatever to generate valid test inputs from source, without the need for a canned binary. You'd include the source and build it directly at run-time. This would only work though if your test addresses are unlikely to change as changes occur in the compiler/linker. llvm-symbolizer was one of the primary motivations for this actually.
Option 2) Check in the assembly required to build the input, and use llvm-mc and other tools (llvm-strip/llvm-gsymutil etc) to turn it into the desired output at runtime.
Option 3) Generate the gsym data directly at run-time somehow.
Hm, I think it makes sense to have a common ground for testing both the DWARF and the GSYM path, to ensure these are (mostly) consistent. If I rewrite this to a different approach, then sym.test should be similarly changed. OTOH, this might mean that rewriting both tests might be done at a later point?
I can remove the second canned binary (actually, it might not be required at all) and the canned GSYM file, and create both at runtime from addr.exe. However, the point is that this uses the same underlying binary as sym.test.
Generally, I think option 1 would be too fragile, and I guess that's the reason why sym.test uses the canned binary?
Option 2... might be feasible, but I don't readily know how to do that.
Option 3... hm I am not sure what you imagine there? Generate a GSYM file without a corresponding binary? That might make sense for additional tests, but again, the test I added was purposefully using the same binary as sym.test.
I don't disagree that in principle any debug information implementation should be able to represent the same things, and therefore the testing here should be (in an ideal world) identical. However, there are a couple of points worth raising: the existing sym.test is testing features that are not specific to the debug format. Ideally, they should be pulled into their own test file (see above my comments about rewriting that test). Also, copying this test as-is would increase our maintenance burden over the longer term. A better approach might be to write the GSYM test from scratch, as if the DWARF test didn't exist, and then write a DWARF equivalent one (alternatively the other way around would also work). The important parts of sym.test could then be split off.
I can remove the second canned binary (actually, it might not be required at all) and the canned GSYM file, and create both at runtime from addr.exe. However, the point is that this uses the same underlying binary as sym.test.
As noted above, I don't think it's good for sym.test to be using a canned binary either. The reason it does is purely because at the time of writing it wasn't possible to avoid the canned binary. We have ways to do that now however.
Generally, I think option 1 would be too fragile, and I guess that's the reason why sym.test uses the canned binary?
cross-project-tests is new. sym.test is very old (see above). It's hard to judge whether 1 would be fragile, but the general opinion of the LLD developers is that the output is likely to be fairly stable for simple things going forward, so I don't think it would be too fragile (see the initial discussion on the mailing list about bringing up that test-suite).
Option 2... might be feasible, but I don't readily know how to do that.
I'm afraid I don't know anything about gsym and therefore how to do this either. It should be possible though somehow.
Option 3... hm I am not sure what you imagine there? Generate a GSYM file without a corresponding binary? That might make sense for additional tests, but again, the test I added was purposefully using the same binary as sym.test.
Some of this point may not make sense at all, given my lack of GSYM knowledge, but see also my above comments.
Hm, that makes sense. I'll give the cross-project-tests a look.
However, one general question that comes to my mind here: The addresses will probably be platform-specific. How can I deal with that? I can determine the addresses on my local (Linux amd64) platform, but how would I do the same generally? I can use nm to determine the address of main (which for the canned binary is 0x400540), but I need some offset to identify an inlined location (which is 0x40054d) for the canned binary. I probably can't generally assume that 0xd is an appropriate offset for any platform.
I'd pin to a specific triple, e.g. x86-64 (along with appropriate REQUIRES), so that the addresses aren't going to change due to moving to a different machine.
@jhenderson I am struggling with even building cross-project-tests. How do I enable those? I tried setting calling cmake with either -DLLVM_TOOL_CROSS_PROJECT_TESTS_BUILD=TRUE or -DLLVM_ENABLE_PROJECTS="clang;cross-platform-tests", but those don't seem to do the job.
It's "cross-project-tests" not "cross-platform-tests" in the LLVM_ENABLE_PROJECTS listing.
If you're adding command-line options a) they might better belong in separate later patches, and b) the documentation should be updated to reference them.
Oh no, what a stupid mistake :( Thanks, that did the trick.
If you're adding command-line options a) they might better belong in separate later patches, and b) the documentation should be updated to reference them.
a) Hm, ok, this will just leave us without without the option to disable GSYM usage after the first patch.
b) Sure, I'll add something to llvm/docs/CommandGuide/llvm-symbolizer.rst
Yes, this is true. I don't know if that's an issue or not though? What's the impact on a typical end user in that interim state? In other words, is it likely that something bad will happen?
Not really. A typical end user that didn't use llvm-gsymutil before won't be affected at all.
If Specifier.FNKind is set to DINameKind::ShortName, we want to demangle the name into LineInfo.FunctionName. If it is set to DINameKind::LinkageName or DINameKind::None, set to to the current name. Demangling can be tricky, as there are many name manglings. Not sure if there is a one stop shop to demangle all sorts of names in LLVM.