This is an archive of the discontinued LLVM Phabricator instance.

[WIP][DebugInfo] refactor DIE classes. Remove dependency on AsmPrinter.
AbandonedPublic

Authored by avl on Mar 17 2020, 9:08 AM.

Details

Summary

Remove dependency of DWARF generation code on AsmPrinter.

This patch is not for integration. I want to consult whether it is going
in the right direction. It demonstrates the possible variant of refactoring.
Not all cases/problems are resolved in this patch currently.

During work on "Remove obsolete debug info in lld" I need to generate
DWARF without using AsmPrinter(To not to add extra libraries to lld).
The current implementation of DWARF generation is heavily depends on
using AsmPrinter. F.e. CodeGen/DIE.h

class DIEInteger {

void emitValue(const AsmPrinter *Asm, dwarf::Form Form) const;
unsigned SizeOf(const AsmPrinter *AP, dwarf::Form Form) const;

}

I think it would be a bad idea to write a new set of DWARF generating
classes, not depending on AsmPrinter. Instead, it would be better
to reduce dependence on AsmPrinter for current structures.
It would allow for reusing existed implementation.

There is already a place which implements such separation :

AsmPrinter/ByteStreamer.h:ByteStreamer
AsmPrinter/DwarfDebug.h:DwarfDebug::emitDebugLocEntry(ByteStreamer &Streamer)

But it is used locally. I think the same idea could be implemented
in a more general way:

There should be a class that allows writing byte data in the dwarf section: DwarfDebugSection.
DwarfDebugSection would have two descendands: DwarfDebugSectionAsm, DwarfDebugSectionBin.
There should be a class that allows writing DWARF constructions(units, line tables,
address ranges) to DwarfDebugSection: DwarfDebugFile.
DIE* classes from CodeGen/DIE.h should be moved into DebugInfoDWARF library.
DwarfDebugSection & DwarfDebugFile should be put into DebugInfoDWARF library.
In a far perspective: DwarfDebugFile could replace AsmPrinter/DwarfFile,
other classes would be moved into DebugInfoDWARF library(DwarfStringPool...).
The above changes would allow generating DWARF without AsmPrinter.
Also, this implementation could probably be reused by debug info tools(dsymutil,
llvm-gsymutil, llvm-dwp...).

This patch presents some changes which show the direction of the refactoring.

What do you think? Would this refactoring be useful?
Is it OK to do things in this manner?

If that idea is OK, then changes could be done incrementally,
moving code(currently located in AsmPrinter) into DebugInfoDWARF library.

Diff Detail

Event Timeline

avl created this revision.Mar 17 2020, 9:08 AM

I too ran into the AsmPrinter being a bit overkill for writing simple data. GSYM has a simple file writer class that just uses llvm::raw_pwrite_stream:

llvm/include/llvm/DebugInfo/GSYM/FileWriter.h
llvm/lib/DebugInfo/GSYM/FileWriter.cpp

Not sure if this would be worth improving and moving to a more generic location so that everyone can use it? It is a very simple class right now, no labels or sections, but these features could be added if needed. GSYM files are currently stand alone files, but in the future we will want GSYM to be a section within a file.

During work on "Remove obsolete debug info in lld" I need to generate DWARF without using AsmPrinter(To not to add extra libraries to lld).

Could you point to where this is a requirement? (has this already been discussed in review/found to be an unacceptable dependency? llvm-dsymutil uses the existing AsmPrinter-based APIs to write the output, yes?)

I think it would be a bad idea to write a new set of DWARF generating classes, not depending on AsmPrinter.

Agreed

Instead, it would be better to reduce dependence on AsmPrinter for current structures.

Agreed, if the dependency is unacceptable.

There is already a place which implements such separation :

AsmPrinter/ByteStreamer.h:ByteStreamer
AsmPrinter/DwarfDebug.h:DwarfDebug::emitDebugLocEntry(ByteStreamer &Streamer)

For historics: That API was added by @echristo to allow both emitting the bytes and hashing the bytes of a location - so it does represent a small instance of the abstraction you're looking for, yes.

But it is used locally. I think the same idea could be implemented in a more general way:

There should be a class that allows writing byte data in the dwarf section: DwarfDebugSection. DwarfDebugSection would have two descendands: DwarfDebugSectionAsm, DwarfDebugSectionBin.

I'd expect it, perhaps, to be more like ByteStreamer - which has a hashing and an AsmPrinter implementation. Though in this case we'd start with only an AsmPrinter implementation (AsmPrinter already abstracts over the asm/binary distinction, so there's hopefully no need to float that up/duplicate that in the DwarfDebugSection abstraction).

(I'm leaving naming mostly aside here for now - that'll have to be iterated on at some point)

There should be a class that allows writing DWARF constructions(units, line tables, address ranges) to DwarfDebugSection: DwarfDebugFile.

Not quite sure what you're picturing with DwarfDebugFile - I guess you mean an abstraction over a collection of DwarfDebugSections related to a particular file (eg: one for a .o file, one for a .dwo file, perhaps).

Though looking at the API for DwarfDebugFile in your proposed patch, it seems to have features I'd consider too high level for that abstraction - like writeCompileUnitHeader etc - anything that is higher level than assembly (so ulebs, ints, relocations, labels) seems inappropriate for the abstraction. AsmPrinter should be the first/trivially wrapped in this abstraction, I think. (I wouldn't mind something less stateful, though - as you say, the section V file abstraction, though that might lead to some confusing assembly - since AsmPrinter is a streaming abstraction, so if you have a stateful section then the usage code might end up thinking it's totally fine to write a few things to one section, then a few things to another section - which creates for less-than-ideal assembly (fragmented sections) & so I think that'd be unfortunate)

DIE* classes from CodeGen/DIE.h should be moved into DebugInfoDWARF library.

I'd say it'd be best to not move it into the DebugInfoDWARF library - that'd have the reverse effect, then LLVM's code generation would end up depending on DWARF parsing code it has no need for... - so I guess this'd need to go in a separate library.

The general idea of making these APIs independent of AsmPrinter - yeah, seems plausible, if it's especially necessary/useful - see the first questions.

avl added a comment.Mar 18 2020, 8:40 AM

Could you point to where this is a requirement? (has this already been discussed in review/found to be an unacceptable dependency?

Actually, I did not discuss it with LLD folks.
It looked obvious from the LLD code structure.
But, that my assumption could be wrong. I`ve asked now in D74169.

So I will wait for feedback from them. If AsmPrinter is OK - then no need to do refactoring now.

llvm-dsymutil uses the existing AsmPrinter-based APIs to write the output, yes?)

yes.

avl added a comment.Mar 19 2020, 2:15 PM

Could you point to where this is a requirement? (has this already been discussed in review/found to be an unacceptable dependency? llvm-dsymutil uses the existing AsmPrinter-based APIs to write the output, yes?)

So far, LLD folks generally do not object adding AsmPrinter if overall impact would be acceptable.

I think I could continue with AsmPrinter for now. Though it looks a bit overcomplicated solution. Simple writer like llvm/include/llvm/DebugInfo/GSYM/FileWriter.h would be more appropriate solution generally.

avl added a comment.Mar 21 2020, 3:09 AM

Such a refactoring could make sense in the future. Currently it would be OK to use AsmPrinter, thus I abandon the review.

avl abandoned this revision.Mar 21 2020, 3:10 AM