Page MenuHomePhabricator

Add GSYM utility files along with unit tests.

Authored by clayborg on Jun 10 2019, 3:59 PM.



The full GSYM patch started with:

In that patch we wanted to split up getting GSYM into the LLVM code base so we are not committing too much code at once.

This is a first in a series of patches where I only add the foundation classes along with complete unit tests. They provide the foundation for encoding and decoding a GSYM file.

File entries are defined in llvm::gsym::FileEntry. This class splits the file up into a directory and filename represented by uniqued string table offsets. This allows all files that are referred to in a GSYM file to be encoded as 1 based indexes into a global file table in the GSYM file.

Function information in stored in llvm::gsym::FunctionInfo. This object represents a contiguous address range that has a name and range with an optional line table and inline call stack information.

Line table entries are defined in llvm::gsym::LineEntry. They store only address, file and line information to keep the line tables simple and allows the information to be efficiently encoded in a subsequent patch.

Inline information is defined in llvm::gsym::InlineInfo. These structs store the name of the inline function, along with one or more address ranges, and the file and line that called this function. They also contain any child inline information.

There are also utility classes for address ranges in llvm::gsym::AddressRange, and string table support in llvm::gsym::StringTable which are simple classes.

The unit tests test all the APIs on these simple classes so they will be ready for the next patches where we will create GSYM files and parse GSYM files.

Diff Detail


Event Timeline

clayborg created this revision.Jun 10 2019, 3:59 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 10 2019, 3:59 PM
Herald added a subscriber: mgorny. · View Herald Transcript

Sorry I wasn't very quick to get to this. This looks really good to me. I'm going to go over some of the this with a fine tooth comb tomorrow but It basically looks good as at high level glance.

47 ↗(On Diff #203920)

Return Option<uint32_t> instead.

clayborg updated this revision to Diff 204556.Jun 13 2019, 8:41 AM


uint32_t llvm::gsym::StringTable::find(StringRef Str) const;

to return optional:

llvm::Optional<uint32_t> llvm::gsym::StringTable::find(StringRef Str) const;

as suggested.

clayborg marked an inline comment as done.Jun 13 2019, 8:41 AM
JDevlieghere requested changes to this revision.Jun 13 2019, 12:44 PM
JDevlieghere added inline comments.
13 ↗(On Diff #203920)

This doesn't conform to the LLVM #include style:

24 ↗(On Diff #203920)

Should this be a Doxygen comment for this struct?

28 ↗(On Diff #203920)

The comment for both is identical. I'd group them like

/// Offsets in the string table.
/// @{
/// @}
26 ↗(On Diff #204556)

Same comment.

51 ↗(On Diff #204556)

Can we make it so that only valid entries are created, for instance by using an Expected<FunctionInfo> wherever it's created? That way we don't need the isValid and maybe even the clear? This is more of a high level comment that applies to the whole diff.

82 ↗(On Diff #204556)

Doxygen comment?

24 ↗(On Diff #204556)

Should this be a Doxygen comment for the struct? Applies for the other classes/structs as well.

This revision now requires changes to proceed.Jun 13 2019, 12:44 PM
clayborg updated this revision to Diff 204805.Jun 14 2019, 10:28 AM
clayborg marked an inline comment as done.

Address review comments:

  • make all comment block doxygen blocks
clayborg marked 6 inline comments as done.Jun 14 2019, 10:28 AM
clayborg added inline comments.
51 ↗(On Diff #204556)

Both the DWARF, breakpad, and symbtab converters will create a temp object and then populate, then check if all is good before adding it. It is possible to avoid this, but the code will be cleaner if we can create the object and then populate it lazily. So I would prefer to keep this if possible.

A bunch of nits but this looks good to me.

87 ↗(On Diff #204805)

nit: Most of the branches here are not tested by my eyes.

98 ↗(On Diff #204805)

The motivation for comparing by lines like this is a bit lost on me. Is it just to ensure that if X != Y then either X < Y or Y < X? You'd have to do that for inlines as well if that's the case. You also don't maintain that property on LineInfo anyway.

33 ↗(On Diff #204805)

In other cases you forwent the dump method and overloaded <<, why not do that here as well?

40 ↗(On Diff #204805)

maybe "isContiguousWith"

43 ↗(On Diff #204805)

maybe "overlaps" or just "intersects" or "overlapsWith"

54 ↗(On Diff #204805)

If looking up the greatest lower bound on an address in an array sorted by this predicate, the largest range possible would be chosen. It seems to me that we would actually prefer the *smallest* range be chosen because its information would be more specific. That would suggest that instead of LHS.End < RHS.End we should actually use LHS.End > RHS.End

68 ↗(On Diff #204805)

prefer using

39 ↗(On Diff #204805)

same question

49 ↗(On Diff #204805)

What is this needed for? I'm not sure I'm happy with this sort of linear search being needed.

41 ↗(On Diff #204805)

return Option<std::vector<...>> instead perhaps?

22 ↗(On Diff #204805)

Do we expect the number of ranges to be small?

26–31 ↗(On Diff #204805)

None of this is needed and should fallout from the std::upper_bound call

32 ↗(On Diff #204805)


34 ↗(On Diff #204805)

In general I think you want a greatest lower bound which is what you're doing but if you keep your current '<' operator and ignore my comment above about swaping the order of 'End' you can use std::lower_bound here and avoid the subtraction case.

aprantl added inline comments.Jun 14 2019, 1:16 PM
41 ↗(On Diff #204805)

It would be so much easier to reason about correctness if these structs were immutable. Doe we really need a clear() method?

clayborg updated this revision to Diff 205123.Jun 17 2019, 10:52 AM
clayborg marked 9 inline comments as done.
clayborg marked 7 inline comments as done.Jun 17 2019, 10:54 AM

Marking inline comments as done and mentioning that I did add tests for FunctionInfo operator < as well for cases where things have inline info and line entries

87 ↗(On Diff #204805)

I'll add some tests.

97–98 ↗(On Diff #204805)

The main goal is to be able to sort FuncrtionInfo objects after we fill them out from a variety of places (symtab, DWARF, breakpad) and quickly pick out the one with the best info. We saw many issues with debug info when writing the parser where address ranges would overlap or differ for the same range. So we tried to pick the best info. We can modify this sorting as needed.

41 ↗(On Diff #204805)

These structs are not encoded in memory like this, they are very efficiently encoded and need to be decoded from a stream of bytes. These items are parsed from a stream, and we either parse everything (if you have a symbolication process that will stay around), or just what we need for a given address (for use in a tool like atos). Not sure if that says anything about wether these structs should be immutable though. Usually during lookups you might call clear on a FunctionInfo's Inline argument in case you lookup multiple addresses into the same FunctionInfo. That being said, nothing is stopping us from changing the APIs that we use for parsing to always return a new instance, or a llvm::Optional.

54 ↗(On Diff #204805)

I would expect an address with the same start address to still compare the end address:

[0x100-0x101) < [0x100-0x102)

with lower bound you always have to know to back up by one in case the the addresses are equal.

68 ↗(On Diff #204805)

llvm/include/llvm/DebugInfo/GSYM/Range.h:71:12: error: using declaration cannot refer to a template specialization
using std::vector<AddressRange> AddressRanges;

^     ~~~~~~~~~~~~~~
49 ↗(On Diff #204805)

I can take this out. This class is used for the mmap'ed string table, not for creating the string table, so I will remove this.

22 ↗(On Diff #204805)

Yes, usually not usually a ton of these.

31 ↗(On Diff #204805)

Will remove. Need empty check though.

34 ↗(On Diff #204805)

I tried that with:

bool llvm::gsym::contains(const AddressRanges &Ranges, uint64_t Addr) {
  if (Ranges.empty())
    return false;
  auto Begin = Ranges.begin();
  auto EndPos = Ranges.end();
  auto Pos = std::lower_bound(Begin, EndPos, Addr);
  if (Pos != EndPos)
    return Pos->contains(Addr);
  return false;

But this failed in the following cases:

EXPECT_TRUE(contains(Ranges, 0x2000-1));
EXPECT_TRUE(contains(Ranges, 0x3000-1));
EXPECT_TRUE(contains(Ranges, 0x5000-1));
jakehehrlich added inline comments.Jun 17 2019, 6:00 PM
97–98 ↗(On Diff #204805)

Ah sorry I meant to click for the line above where it says LHS.Lines < RHS.Lines I mean that that specifically doesn't make sense to me. It's not clear that that would correlate to a better answer in anyway to me.

54 ↗(On Diff #204805)

You're right. I just got by logic a bit wrong there. That was the result I expected but I got my self turned around in the reasoning somehow.

68 ↗(On Diff #204805)

using AddressRanges = std::vector<AddressRange>; is what I meant.

31 ↗(On Diff #204805)

I think if you remove

if (Pos == EndPos)
    return Ranges.back().contains(Addr);

Then in the empty case Pos == EndPos == Begin which means if (Pos != Begin) will jump to the return false line right?

34 ↗(On Diff #204805)

Ah right, It doesn't give you a strict lower bound. I suppose what you have is best. Why is the first algorithm we learn in school so hard to get right?

clayborg updated this revision to Diff 205362.Jun 18 2019, 8:38 AM

Converted AddressRange objects into a class so we can ensure consistent and correct results when using accessors.

Converted AddressRanges into a class instead of a typedef to a vector of AddressRange objects. Converted all places that were directly using the vector to use the class. The AddressRanges now will combine intersecting address ranges and not add empty AddressRange objects to a list. This will help normalize address range objects so we can emit them efficiently into the final GSYM file.

clayborg marked 3 inline comments as done.Jun 18 2019, 8:39 AM

Marked things as done.

69 ↗(On Diff #205362)

I ended up converting AddressRanges to a class for more control and normalization of address range collections.

friendly ping

It would be great to get this moving. I have figured out how the rest of the patches will be broken up to make them really simple to test and get in, but need this one to go in first.

jakehehrlich accepted this revision.Jun 25 2019, 10:59 AM

Sorry about the lull here. This LGTM

aprantl added inline comments.Jun 25 2019, 1:59 PM
24 ↗(On Diff #203920)

I think it needs to go *before* the struct FileEntry { line, doesn't it?

This revision was not accepted when it landed; it landed in state Needs Review.Jun 26 2019, 7:09 AM
This revision was automatically updated to reflect the committed changes.

Fixed build bots as they showed an iterator was being used even if it was the end of a collection.

$ svn commit
Sending lib/DebugInfo/GSYM/Range.cpp
Transmitting file data .done
Committing transaction...
Committed revision 364447.

kcc added a subscriber: kcc.Jun 26 2019, 9:59 AM

I think this is causing a failure on the asan bot. Please fix or revert ASAP

[==========] Running 1 test from 1 test case.
[----------] Global test environment set-up.
[----------] 1 test from GSYMTest

[ RUN ] GSYMTest.TestRanges

19204==ERROR: AddressSanitizer: container-overflow on address 0x606000000390 at pc 0x00000052cde7 bp 0x7ffdbaad4740 sp 0x7ffdbaad4738

READ of size 8 at 0x606000000390 thread T0

#0 0x52cde6 in endAddress /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/DebugInfo/GSYM/Range.h:48:40
#1 0x52cde6 in intersects /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/DebugInfo/GSYM/Range.h:58
#2 0x52cde6 in intersect /b/sanitizer-x86_64-linux-fast/build/llvm/include/llvm/DebugInfo/GSYM/Range.h:61
#3 0x52cde6 in llvm::gsym::AddressRanges::insert(llvm::gsym::AddressRange const&) /b/sanitizer-x86_64-linux-fast/build/llvm/lib/DebugInfo/GSYM/Range.cpp:35
#4 0x511b8d in GSYMTest_TestRanges_Test::TestBody() /b/sanitizer-x86_64-linux-fast/build/llvm/unittests/DebugInfo/GSYM/GSYMTest.cpp:350:10
#5 0x5bcd00 in HandleExceptionsInMethodIfSupported<testing::Test, void> /b/sanitizer-x86_64-linux-fast/build/llvm/utils/unittest/googletest/src/

The test binary looks like it has many dependencies it doesn't actually need:


This looks unused?


This, too.


As does this.


And this.


And this.

MaskRay added inline comments.

llvm has switched to Apache 2.0. Please check other files for the correct licence notice.

MaskRay added inline comments.Jun 28 2019, 3:20 AM
30 ↗(On Diff #204805)

These mutable setters do not seem very useful. I deleted them in r364637

40 ↗(On Diff #204805)

The name still doesn't make much sense to me. Deleted in r364634

59 ↗(On Diff #204805)

Comparison between different types of objects may bring confusion. They were only used in upper_bound. I deleted them in r364634.

68 ↗(On Diff #204805)

Fixed in r364634


Removed in r364634