This is an archive of the discontinued LLVM Phabricator instance.

Add the ability to segment GSYM files.
ClosedPublic

Authored by clayborg on Feb 10 2023, 4:17 PM.

Details

Summary

Some workflows can generate large GSYM files and sharding GSYM files into segments can help some performant workflows that can take advantage of smaller GSYM files. This patch add a new --segment-size option to llvm-gsymutil. This option can specify a rough size in bytes of how large each segment should be.

Segmented GSYM files contain only the strings and files that are needed for the FunctionInfo objects that are added to each shard. The output file path gets the first address of the first contained function info appended as a suffix to the filename. If a base address of an image is set in the GsymCreator, then all segments will use this same base address which allows lookups for symbolication to happen correctly when the image has been slid in memory.

Code has been addeed to refactor and re-use methods within the GsymCreator to allow for segments to be created easily and tested.

Example of segmenting GSYM files:

$ llvm-gsymutil --convert llvm-gsymutil.dSYM -o llvm-gsymutil.gsym --segment-size 10485760
$ ls -l llvm-gsymutil.gsym-*
-rw-r--r-- 1 gclayton staff 10485839 Feb 9 10:45 llvm-gsymutil.gsym-0x1000030c0
-rw-r--r-- 1 gclayton staff 10485765 Feb 9 10:45 llvm-gsymutil.gsym-0x100668888
-rw-r--r-- 1 gclayton staff 10485881 Feb 9 10:45 llvm-gsymutil.gsym-0x100c948b8
-rw-r--r-- 1 gclayton staff 10485954 Feb 9 10:45 llvm-gsymutil.gsym-0x101659e70
-rw-r--r-- 1 gclayton staff 10485792 Feb 9 10:45 llvm-gsymutil.gsym-0x1022b1dc0
-rw-r--r-- 1 gclayton staff 10485889 Feb 9 10:45 llvm-gsymutil.gsym-0x102a18b10
-rw-r--r-- 1 gclayton staff 10485893 Feb 9 10:45 llvm-gsymutil.gsym-0x1030b05d0
-rw-r--r-- 1 gclayton staff 10485802 Feb 9 10:45 llvm-gsymutil.gsym-0x1037caaac
-rw-r--r-- 1 gclayton staff 10485781 Feb 9 10:45 llvm-gsymutil.gsym-0x103e767a0
-rw-r--r-- 1 gclayton staff 10485832 Feb 9 10:45 llvm-gsymutil.gsym-0x10452d0d4
-rw-r--r-- 1 gclayton staff 10485782 Feb 9 10:45 llvm-gsymutil.gsym-0x104b93310
-rw-r--r-- 1 gclayton staff 6255785 Feb 9 10:45 llvm-gsymutil.gsym-0x10526bf34

Diff Detail

Event Timeline

clayborg created this revision.Feb 10 2023, 4:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 4:17 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
clayborg requested review of this revision.Feb 10 2023, 4:17 PM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 10 2023, 4:17 PM

If anyone doesn't know the GSYM code that I added as reviewers, feel free to resign. I just did some blame lines on GSYM code to search for people that have made some fixes to the code since there aren't a lot of people that know the GSYM code!

ayermolo added inline comments.Feb 11 2023, 5:10 PM
llvm/include/llvm/DebugInfo/GSYM/GsymCreator.h
141

Maybe make it 64bit to future proof?

llvm/lib/DebugInfo/GSYM/GsymCreator.cpp
93

Maybe return an error like below? In case this is built with asserts disabled.

avl added inline comments.Feb 13 2023, 6:51 AM
llvm/lib/DebugInfo/GSYM/FunctionInfo.cpp
125

probably, name it a bit more descriptive?

llvm/lib/DebugInfo/GSYM/GsymCreator.cpp
450
yinghuitan added inline comments.Feb 14 2023, 10:49 AM
llvm/lib/DebugInfo/GSYM/GsymCreator.cpp
455

Instead of magic value 4, maybe sizeof(<type-of-offset-entry>) in case it is changed in future?

504–506

Is the lock only needed during accessing Funcs field? Maybe scope locking it instead of locking the entire function.

518–519

Shouldn't this be an error with meaning message instead of simply break and return success?

556

This is not changing per function, right? If so move outside of loop.

557

We should also detect outside loop at least SegmentSize > SegmentFuncInfosSize so that we are not creating dummy segment file with only header.

llvm/tools/llvm-gsymutil/llvm-gsymutil.cpp
322

The line is very long, maybe run clang-format against it.

clayborg updated this revision to Diff 499965.Feb 23 2023, 1:32 PM

Address review comments:

  • createSegment now can return an error if the segment size is too small
  • added code to detect segment sizes that are too small to contain even a single function info
  • added test code to test for small segment sizes
clayborg marked 6 inline comments as done.Feb 23 2023, 1:34 PM
clayborg added inline comments.
llvm/lib/DebugInfo/GSYM/GsymCreator.cpp
504–506

This is correctly scoped right at the end of the function to only lock during Funcs modification and using the Funcs.back().

518–519

It isn't an error, createSegment returns NULL if there are no more functions which is ok.

518–519

I now return an expected to catch the error case where segment size is too small to contain a single function info.

556

It is changing per function, since each function info can add more strings and files to the GSYM file. calculateHeaderAndTableSize is fast to calculate. So this needs to stay here.

557

I will move this if statement after the copyFunctionInfo to ensure we always create a GSYM with at least one entry.

557

I modified the code to detect when no function infos have been emitted and return an error when I detect that. Also added a test.

yinghuitan accepted this revision.Mar 2 2023, 10:59 AM
yinghuitan added inline comments.
llvm/lib/DebugInfo/GSYM/GsymCreator.cpp
504–506

My original worry is compiler can hoist Guard to the beginning of the function causing the function scope locking. Now thinking about this, compiler shouldn't be doing that because it will have the side effect of changing the locking semantics. Still, I think it would be more explicit to add these into a sub-scope.

This revision is now accepted and ready to land.Mar 2 2023, 10:59 AM
ayermolo accepted this revision.Mar 2 2023, 4:14 PM
ayermolo added inline comments.
llvm/tools/llvm-gsymutil/llvm-gsymutil.cpp
112

Should we have some kind of sane default?

This revision was landed with ongoing or failed builds.Mar 2 2023, 8:40 PM
This revision was automatically updated to reflect the committed changes.
chapuni added inline comments.
llvm/lib/DebugInfo/GSYM/GsymCreator.cpp
126

MaxAddressOffset is used only here.

I get the following with this patch

../unittests/DebugInfo/GSYM/GSYMTest.cpp:2492:12: error: call to deleted constructor of 'llvm::Error'
    return FinalizeErr;
           ^~~~~~~~~~~
../include/llvm/Support/Error.h:185:3: note: 'Error' has been explicitly marked deleted here
  Error(const Error &Other) = delete;
  ^
../include/llvm/Support/Error.h:492:18: note: passing argument to parameter 'Err' here
  Expected(Error Err)
                 ^
../unittests/DebugInfo/GSYM/GSYMTest.cpp:2499:12: error: call to deleted constructor of 'llvm::Error'
    return Err;
           ^~~
../include/llvm/Support/Error.h:185:3: note: 'Error' has been explicitly marked deleted here
  Error(const Error &Other) = delete;
  ^
../include/llvm/Support/Error.h:492:18: note: passing argument to parameter 'Err' here
  Expected(Error Err)
                 ^
2 errors generated.

Also seen here
https://lab.llvm.org/buildbot/#/builders/109/builds/58893