This is an archive of the discontinued LLVM Phabricator instance.

Use LLVM's debug line parser in LLDB
ClosedPublic

Authored by JDevlieghere on May 28 2019, 11:45 PM.

Details

Summary

The line number table header was substantially revised in DWARF 5, which is not supported by LLDB's current debug line parser. Given the relatively limited contact surface between the code to parse the debug line section and the rest of LLDB, this seems like a good candidate to replace with LLVM's implementation, which does support the new standard.

In its current state this patch is mostly a proof-of-concept to show that this transition is possible. There are currently 8 failing tests, all related to the file paths from the prologue. (I know because I did the change in two parts, first the line table entries, then the so-called *support files*) I have yet to look at the failures in detail, but they are related to rnglists and split DWARF. Additionally, I'll need to ensure that there's minimal duplicate work when creating an LLVM DWARFContext and CompileUnit. We should only be parsing the UnitDIE, which I think is reasonable.

Diff Detail

Repository
rL LLVM

Event Timeline

JDevlieghere created this revision.May 28 2019, 11:45 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 28 2019, 11:45 PM

Interesting stuff. I was wondering why you even needed to construct the DWARFContext and DWARFUnit, so I looked around a bit. It looks like they are only used for calling DWARFFormValue::extract to fetch the strings from the string table in the V5 case. So, it sounds like we need the DWARFContext to reach the string table, but maybe we don't need to create the DWARFUnit ? After all, one of the main goals of DWARF5 debug_lines was to have them be independent of the debug_info section.

What I'm trying to get at here is that it may be possible to limit the amount of work being done in DWARFContext by passing it only the small set of sections that it needs to get this parsed (something like debug_line, debug_str, debug_line_str ?).

BTW, I am also going to be touching this code slightly, as I'm trying to make sure the "support file" lists are shared between the various type units. However, it hopefully won't conflict with this too much, as I mostly approach it from the "other end" (i.e., what happens to the file lists after they have been parsed).

Very nice!

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
874 ↗(On Diff #201830)

should this be a getAsLLVM() member function or even an operator()?

overall looks good. See inlined comments.

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
929–938 ↗(On Diff #201830)

Can we make a function out of this that just returns the "const llvm::DWARFDebugLine::LineTable *"? Or make a new location variable that is just a "const llvm::DWARFDebugLine::LineTable *" to avoid the "(*line_table)->" stuff below?

grimar added inline comments.May 30 2019, 1:40 AM
lldb/include/lldb/Core/FileSpecList.h
29 ↗(On Diff #201830)

Seems you don't use iterator anywhere?

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
792 ↗(On Diff #201830)

You do not use stmt_list anywhere else, so I would suggest to:

if (cu_die.GetAttributeValueAsUnsigned(DW_AT_stmt_list, DW_INVALID_OFFSET) ==
    DW_INVALID_OFFSET)
  continue;
879 ↗(On Diff #201830)

AddSection->addSection?

929–938 ↗(On Diff #201830)

+1 for new location variable.

974 ↗(On Diff #201830)

Seems you can avoid having found_original and found_remapped:
(assuming table is a new variable mentioned above)

if (!table->getFileNameByIndex(
        index + 1, compile_dir,
        llvm::DILineInfoSpecifier::FileLineInfoKind::Default, original_file))
  continue;

BTW, how good is the llvm debug_line parser when it comes to handling invalid input? Looking through the asserts in the file, it looks like at least two of them can be tripped by feeding it bad data:

cmtice added a subscriber: cmtice.Jun 10 2019, 10:05 AM
  • Rebase on current master
  • Fix bug where we weren't actually creating any sections in the LLVM DWARF Context
  • Don't create an LLVM DWARFUnit
  • Get rid of (*line_table)->
  • Add logging instead of swallowing errors
JDevlieghere marked 7 inline comments as done.Jul 12 2019, 5:20 PM
JDevlieghere added inline comments.
lldb/include/lldb/Core/FileSpecList.h
29 ↗(On Diff #201830)

It's used in the range-based for loop in ParseSupportFiles

JDevlieghere marked 2 inline comments as done.
  • Address remaining comment from George
labath added inline comments.Jul 16 2019, 3:32 AM
lldb/include/lldb/Core/FileSpecList.h
191–192 ↗(On Diff #209650)

If both of these are returning the const iterators, then you don't need non-const versions..

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
804 ↗(On Diff #209650)

What's the reason for this global m_support_files map? Is it just because you cannot set the "support" file list from inside the ParseLineTable call, and parsing the line table produces the file list as a "side effect". I've ran into the same problem in SymbolFileBreakpad, and I've had to bend over backwards to ensure we don't store the file list twice, so maybe it's time to have a more general solution to that. I think it would suffice to add CompileUnit::SetSupportFiles (to complement CompileUnit::SetLineTable). That way, you could call both comp_unit->SetSupportFiles and ->SetLineTable regardless of which one was being asked for.

832 ↗(On Diff #209650)

I guess here you ought to to something similar to the block of code on line 981. (i.e., parse the line table, fetch files from it, and remap them. The only difference would be that you don't use the compilation directory from the type unit to resolve relative paths (because type units have no DW_AT_comp_dir).

JDevlieghere marked 3 inline comments as done.
JDevlieghere retitled this revision from [WIP] Use LLVM's debug line parser in LLDB to Use LLVM's debug line parser in LLDB.
  • Address code review feedback
  • Fix support files for type units

So in the actual line table parsing code it seems like we are creating a line table in LLVM format, complete with sequences and a bunch of LLVM data structures, and then copying the results over into LLDB and then throwing away the LLVM line table. It would be nicer to modify the LLVM line table code to be able to get a callback when a row was pushed (if it isn't already in that format) and expose that so clients, like LLDB, can just get a callback and only store the data structure we care about. This would allow us to share the llvm line table parsing code and re-use it in LLDB, but not require twice the memory and overhead. Symbolication tools could also take advantage of this by not storing any rows except the closest matching row.

lldb/source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
201 ↗(On Diff #210222)

I am worried about this continue stuff. If this fails, all of our file indexes will be off and nothing will make sense. We need the indexes in the support files to match those in the prologue.

208 ↗(On Diff #210222)

Ditto

211 ↗(On Diff #210222)

Is this FileSpec constructor going to do the right thing for all path types (windows, linux, etc)?

JDevlieghere marked 3 inline comments as done.

Keep the indices correct when remapping a file fails.

We also need to test to ensure this doesn't regress parsing speed. Need a large C++ project, like LLDB with debug clang and debug llvm, and something that forces all line tables to be parsed (not just the prologues like when setting a breakpoint) and see how things compare between the two methods.

Rebase and fix a bug in the support file implementation. I'm not sure why this didn't manifest itself earlier.

I have a plan to benchmark this (using image dump line-table) but I probably won't get around to that today. Hopefully I'll have data by early next week.

aprantl added inline comments.Aug 9 2019, 5:15 PM
lldb/include/lldb/Core/FileSpecList.h
29 ↗(On Diff #201830)

Technically I think these typedefs should use LLDB style

using Collection = std::vector<FileSpec>; / typedef std::vector<FileSpec> Collection.

Though not the const_iterator, that would be silly.

I used image dump line-table to benchmark the performance difference. I did a RelWithDebugInfo build of clang and picked 237 files to dump.

repeat 10 "time ./bin/lldb -b -o 'image dump line-table ARCMT.cpp ARCMTActions.cpp FileRemapper.cpp ObjCMT.cpp PlistReporter.cpp TransAPIUses.cpp TransARCAssign.cpp TransAutoreleasePool.cpp TransBlockObjCVariable.cpp TransEmptyStatementsAndDealloc.cpp TransGCAttrs.cpp TransGCCalls.cpp TransProperties.cpp TransProtectedScope.cpp TransRetainReleaseDealloc.cpp TransUnbridgedCasts.cpp TransUnusedInitDelegate.cpp TransZeroOutPropsInDealloc.cpp TransformActions.cpp Transforms.cpp APValue.cpp ASTConsumer.cpp ASTContext.cpp ASTDiagnostic.cpp ASTDumper.cpp ASTImporter.cpp ASTImporterLookupTable.cpp ASTStructuralEquivalence.cpp ASTTypeTraits.cpp AttrImpl.cpp CXXInheritance.cpp Comment.cpp CommentBriefParser.cpp CommentCommandTraits.cpp CommentLexer.cpp CommentParser.cpp CommentSema.cpp ComparisonCategories.cpp DataCollection.cpp Decl.cpp DeclBase.cpp DeclCXX.cpp DeclFriend.cpp DeclGroup.cpp DeclObjC.cpp DeclOpenMP.cpp DeclPrinter.cpp DeclTemplate.cpp DeclarationName.cpp Expr.cpp ExprCXX.cpp ExprClassification.cpp ExprConstant.cpp ExprObjC.cpp ExternalASTSource.cpp FormatString.cpp ItaniumCXXABI.cpp ItaniumMangle.cpp JSONNodeDumper.cpp Mangle.cpp MicrosoftCXXABI.cpp MicrosoftMangle.cpp NSAPI.cpp NestedNameSpecifier.cpp ODRHash.cpp OSLog.cpp OpenMPClause.cpp ParentMap.cpp PrintfFormatString.cpp QualTypeNames.cpp RawCommentList.cpp RecordLayout.cpp RecordLayoutBuilder.cpp ScanfFormatString.cpp SelectorLocationsKind.cpp Stmt.cpp StmtCXX.cpp StmtIterator.cpp StmtObjC.cpp StmtOpenMP.cpp StmtPrinter.cpp StmtProfile.cpp StmtViz.cpp TemplateBase.cpp TemplateName.cpp TextNodeDumper.cpp Type.cpp TypeLoc.cpp TypePrinter.cpp VTTBuilder.cpp VTableBuilder.cpp ASTMatchFinder.cpp ASTMatchersInternal.cpp Parser.cpp AnalysisDeclContext.cpp BodyFarm.cpp CFG.cpp CFGReachabilityAnalysis.cpp CFGStmtMap.cpp CallGraph.cpp CloneDetection.cpp CocoaConventions.cpp CodeInjector.cpp ConstructionContext.cpp Consumed.cpp Dominators.cpp ExprMutationAnalyzer.cpp LiveVariables.cpp ObjCNoReturn.cpp PostOrderCFGView.cpp ProgramPoint.cpp ReachableCode.cpp RetainSummaryManager.cpp ThreadSafety.cpp ThreadSafetyCommon.cpp ThreadSafetyTIL.cpp UninitializedValues.cpp Attributes.cpp Builtins.cpp CharInfo.cpp CodeGenOptions.cpp Cuda.cpp Diagnostic.cpp DiagnosticIDs.cpp DiagnosticOptions.cpp FileManager.cpp FileSystemStatCache.cpp FixedPoint.cpp IdentifierTable.cpp LangOptions.cpp LangStandards.cpp Module.cpp ObjCRuntime.cpp OpenMPKinds.cpp OperatorPrecedence.cpp SanitizerBlacklist.cpp SanitizerSpecialCaseList.cpp Sanitizers.cpp SourceLocation.cpp SourceManager.cpp TargetInfo.cpp AArch64.cpp AMDGPU.cpp ARC.cpp ARM.cpp AVR.cpp BPF.cpp Hexagon.cpp Lanai.cpp Le64.cpp MSP430.cpp Mips.cpp NVPTX.cpp OSTargets.cpp PNaCl.cpp PPC.cpp RISCV.cpp SPIR.cpp Sparc.cpp SystemZ.cpp TCE.cpp WebAssembly.cpp X86.cpp XCore.cpp Targets.cpp TokenKinds.cpp Version.cpp Warnings.cpp XRayInstr.cpp XRayLists.cpp BackendUtil.cpp CGAtomic.cpp CGBlocks.cpp CGBuiltin.cpp CGCUDANV.cpp CGCUDARuntime.cpp CGCXX.cpp CGCXXABI.cpp CGCall.cpp CGClass.cpp CGCleanup.cpp CGCoroutine.cpp CGDebugInfo.cpp CGDecl.cpp CGDeclCXX.cpp CGException.cpp CGExpr.cpp CGExprAgg.cpp CGExprCXX.cpp CGExprComplex.cpp CGExprConstant.cpp CGExprScalar.cpp CGGPUBuiltin.cpp CGLoopInfo.cpp CGNonTrivialStruct.cpp CGObjC.cpp CGObjCGNU.cpp CGObjCMac.cpp CGObjCRuntime.cpp CGOpenCLRuntime.cpp CGOpenMPRuntime.cpp CGOpenMPRuntimeNVPTX.cpp CGRecordLayoutBuilder.cpp CGStmt.cpp CGStmtOpenMP.cpp CGVTT.cpp CGVTables.cpp CodeGenAction.cpp CodeGenFunction.cpp CodeGenModule.cpp CodeGenPGO.cpp CodeGenTBAA.cpp CodeGenTypes.cpp ConstantInitBuilder.cpp CoverageMappingGen.cpp ItaniumCXXABI.cpp MacroPPCallbacks.cpp MicrosoftCXXABI.cpp ModuleBuilder.cpp ObjectFilePCHContainerOperations.cpp PatternInit.cpp SanitizerMetadata.cpp SwiftCallingConv.cpp TargetInfo.cpp VarBypassDetector.cpp CrossTranslationUnit.cpp Action.cpp Compilation.cpp DarwinSDKInfo.cpp Distro.cpp Driver.cpp DriverOptions.cpp Job.cpp Multilib.cpp Phases.cpp SanitizerArgs.cpp Tool.cpp' ~/llvm/build-benchmark/bin/clang > /dev/null"

LLDB Line Table Parser

6.13 real         4.97 user         1.13 sys
6.15 real         5.00 user         1.12 sys
6.11 real         4.96 user         1.11 sys
6.10 real         4.95 user         1.12 sys
6.14 real         4.98 user         1.13 sys
6.15 real         4.98 user         1.13 sys
6.10 real         4.94 user         1.13 sys
6.18 real         5.02 user         1.13 sys
6.15 real         4.95 user         1.16 sys
6.06 real         4.92 user         1.10 sys
--------------------------------------------
6.13 real

LLVM Line Table Parser

6.34 real         5.11 user         1.16 sys
6.34 real         5.13 user         1.17 sys
6.35 real         5.14 user         1.17 sys
6.29 real         5.09 user         1.15 sys
6.27 real         5.07 user         1.16 sys
6.32 real         5.13 user         1.15 sys
6.21 real         5.04 user         1.13 sys
6.13 real         4.99 user         1.11 sys
6.08 real         4.93 user         1.12 sys
6.13 real         4.97 user         1.12 sys
--------------------------------------------
6.25 real

There's a small slowdown (~1.9%) which is not unexpected given that we need to do bridging to LLVM.

Thanks for running the perf tests, that will be great to keep handy somewhere. Might be good to check this test in somewhere. I still think that making changes to the llvm line parser might be a good idea where we can get a callback when a row is to be pushed with the current line table state. This would allow symbolication clients that need just one address, like in atos kind of cases, to not have to store an array of these things and then search them. It would also allow us to just populate our line table as we see fit without the need to have complete LLVM tables and then convert them. Shouldn't be very hard to modify the code in llvm. Let me know what you think.

aprantl accepted this revision.Aug 13 2019, 10:00 AM

Two percent is a noticeable price tag, but since we are doing extra work with error handling and building a copy of the line table in memory first; understandable, I'm moderately optimistic that we could increase the performance a bit in the future by eliminating the extra copy step, though perhaps not with some larger changes to the LLVM interface. Since we need this for DWARF 5 support and the performance penalty isn't that bad, this patch gets a thumbs up from me. I ssume you profiled the code and there isn't any obvious silliness in the implementation?

This revision is now accepted and ready to land.Aug 13 2019, 10:00 AM

I agree with Adrian, this isn't something that needs to hold up this patch, but I do think it is worth doing soon in LLVM. The line table parsing code just needs to take a std::function (or llvm equivalent) that gets called like a lambda when a row is to be pushed. The current LLVM parsing code just has this callback create its own line tables as is currently done. We would just create our own line tables using our callback. So it won't take much work at all in llvm to make this happen and then be able to adopt it in LLDB.

Thank you! I agree that extending the LLVM parser is a good idea, especially if it's already using a callback internally. I'll add a comment in the code and spend some time looking at that in the coming days.

clayborg accepted this revision.Aug 13 2019, 12:36 PM
This revision was automatically updated to reflect the committed changes.

Thank you for working on this.

lldb/trunk/source/Plugins/SymbolFile/DWARF/CMakeLists.txt