This is an archive of the discontinued LLVM Phabricator instance.

[DWP] Refactoring llvm-dwp in to a library.
ClosedPublic

Authored by ayermolo on Jul 16 2021, 4:49 PM.

Details

Summary

This is a step1, mechanical refactor, of moving the bulk of llvm-dwp functionality in to a library. This should allow other tools, like BOLT, to re-use some of the llvm-dwp functionality.

Diff Detail

Event Timeline

ayermolo created this revision.Jul 16 2021, 4:49 PM
ayermolo requested review of this revision.Jul 16 2021, 4:49 PM
Herald added a project: Restricted Project. · View Herald Transcript
ayermolo updated this revision to Diff 359498.Jul 16 2021, 5:19 PM

Removing some includes.

No dobut we've discussed this before, but could you refresh my memory on why bolt needs access to dwp rewriting?

Is it possible we could emit bolt-compatible DWARF instead (I imagine we'll want to do this for Propeller anyway) - for instance if Bolt can change the length of a code range (so DWARF's use of offsets relative to known addresses would become inaccurate) perhaps when code generating we could know that those ranges could change and instead emit addresses rather than offsets? Thoese addresses would go in the address pool and be modifiable by Bolt without having to rewrite the dwp (this will significantly increase the .o/executable size of the DWARF due to the use of more address relocations, but may still be worthwhile/the right direction, possibly).

No dobut we've discussed this before, but could you refresh my memory on why bolt needs access to dwp rewriting?

Is it possible we could emit bolt-compatible DWARF instead (I imagine we'll want to do this for Propeller anyway) - for instance if Bolt can change the length of a code range (so DWARF's use of offsets relative to known addresses would become inaccurate) perhaps when code generating we could know that those ranges could change and instead emit addresses rather than offsets? Thoese addresses would go in the address pool and be modifiable by Bolt without having to rewrite the dwp (this will significantly increase the .o/executable size of the DWARF due to the use of more address relocations, but may still be worthwhile/the right direction, possibly).

The main motivation is to simplify the compilation pipeline, with some speed up. Right now we build pre-bolt binary with DWO sections in .dwo/.o, then we run bolt which writes out modified .dwo. Final step we run llvm-dwp which reads all of this in, then writes out again. With bolt supporting dwp as output we can short circuit running llvm-dwp and just write it out directly.
Supporting DWP as input, is mainly just to make it feature complete. It was a low hanging fruit.

I guess on thing that can simplify BOLT. In some cases skeleton CU doesn't have DW_AT_GNU_ranges_base. In general bolt converts DW_AT_low_pc/DW_AT_high_pc to ranges. If Skeleton CU doesn't have DW_AT_GNU_ranges_base we replace DW_AT_low_pc with it. Except DW_AT_GNU_ranges_base doesn't fit in 8 bits. So Abbrev section can't be patched, and I ended up re-writing it. Not sure what can be done there. Maybe llvm can always emit DW_AT_GNU_ranges_base in Skeleton CU?

No dobut we've discussed this before, but could you refresh my memory on why bolt needs access to dwp rewriting?

Is it possible we could emit bolt-compatible DWARF instead (I imagine we'll want to do this for Propeller anyway) - for instance if Bolt can change the length of a code range (so DWARF's use of offsets relative to known addresses would become inaccurate) perhaps when code generating we could know that those ranges could change and instead emit addresses rather than offsets? Thoese addresses would go in the address pool and be modifiable by Bolt without having to rewrite the dwp (this will significantly increase the .o/executable size of the DWARF due to the use of more address relocations, but may still be worthwhile/the right direction, possibly).

The main motivation is to simplify the compilation pipeline, with some speed up. Right now we build pre-bolt binary with DWO sections in .dwo/.o, then we run bolt which writes out modified .dwo. Final step we run llvm-dwp which reads all of this in, then writes out again. With bolt supporting dwp as output we can short circuit running llvm-dwp and just write it out directly.
Supporting DWP as input, is mainly just to make it feature complete. It was a low hanging fruit.

Is there no chance of the compiler knowing ahead of time what the cut-points are and emitting DWARF that already accounts for that?

If it's only about ranges - yeah, we could have a mode where /everything/ is described by a range, though Split DWARF doesn't provide a way for DIEs in the dwo to references ranges in the linked binary (they reference .debug_rnglists.dwo instead) - even if those ranges used startx_endx encodings, you couldn't change the number of ranges needed to describe something (so you couldn't split a contiguous sequence into a range by only modifying the linked debug info, without touching the .dwo/dwp). So, yeah, without something more like Propeller, where the cut-points are known at compile-time, modifying dwo/dwp seems necessary.

Thanks for walking me through it again.

Perhaps this should be in a subdirectory of "DebugInfo" rather than in its own top level directory? Eh, I guess if DWARFLinker is in its own top-level directory then so could this.

Could you split the change up a bit - do some/all of the library refactoring in-place before moving directories? It'd make it easier to separate/undertsand the changes.

ayermolo updated this revision to Diff 359849.Jul 19 2021, 11:20 AM

Breaking up in to two patches. This one just refactores code in to seperate files.

Herald added a project: Restricted Project. · View Herald TranscriptJul 19 2021, 11:20 AM
Herald added a reviewer: Restricted Project. · View Herald Transcript
ayermolo updated this revision to Diff 359852.Jul 19 2021, 11:23 AM

minor cleanup

ayermolo removed a reviewer: Restricted Project.Jul 19 2021, 11:24 AM

Not sure why reviews tags both llvm-dwp and DWPLibrary.cpp as copied from each other.

No dobut we've discussed this before, but could you refresh my memory on why bolt needs access to dwp rewriting?

Is it possible we could emit bolt-compatible DWARF instead (I imagine we'll want to do this for Propeller anyway) - for instance if Bolt can change the length of a code range (so DWARF's use of offsets relative to known addresses would become inaccurate) perhaps when code generating we could know that those ranges could change and instead emit addresses rather than offsets? Thoese addresses would go in the address pool and be modifiable by Bolt without having to rewrite the dwp (this will significantly increase the .o/executable size of the DWARF due to the use of more address relocations, but may still be worthwhile/the right direction, possibly).

The main motivation is to simplify the compilation pipeline, with some speed up. Right now we build pre-bolt binary with DWO sections in .dwo/.o, then we run bolt which writes out modified .dwo. Final step we run llvm-dwp which reads all of this in, then writes out again. With bolt supporting dwp as output we can short circuit running llvm-dwp and just write it out directly.
Supporting DWP as input, is mainly just to make it feature complete. It was a low hanging fruit.

Is there no chance of the compiler knowing ahead of time what the cut-points are and emitting DWARF that already accounts for that?

If it's only about ranges - yeah, we could have a mode where /everything/ is described by a range, though Split DWARF doesn't provide a way for DIEs in the dwo to references ranges in the linked binary (they reference .debug_rnglists.dwo instead) - even if those ranges used startx_endx encodings, you couldn't change the number of ranges needed to describe something (so you couldn't split a contiguous sequence into a range by only modifying the linked debug info, without touching the .dwo/dwp). So, yeah, without something more like Propeller, where the cut-points are known at compile-time, modifying dwo/dwp seems necessary.

Thanks for walking me through it again.

Perhaps this should be in a subdirectory of "DebugInfo" rather than in its own top level directory? Eh, I guess if DWARFLinker is in its own top-level directory then so could this.

Could you split the change up a bit - do some/all of the library refactoring in-place before moving directories? It'd make it easier to separate/undertsand the changes.

Bolt introduces new ranges by splitting hot/cold sections, etc.
Side note. Currently BOLT only supports DWARF 4. I know GCC turned on DWARF 5 by default. Are there plans on llvm side?

No dobut we've discussed this before, but could you refresh my memory on why bolt needs access to dwp rewriting?

Is it possible we could emit bolt-compatible DWARF instead (I imagine we'll want to do this for Propeller anyway) - for instance if Bolt can change the length of a code range (so DWARF's use of offsets relative to known addresses would become inaccurate) perhaps when code generating we could know that those ranges could change and instead emit addresses rather than offsets? Thoese addresses would go in the address pool and be modifiable by Bolt without having to rewrite the dwp (this will significantly increase the .o/executable size of the DWARF due to the use of more address relocations, but may still be worthwhile/the right direction, possibly).

The main motivation is to simplify the compilation pipeline, with some speed up. Right now we build pre-bolt binary with DWO sections in .dwo/.o, then we run bolt which writes out modified .dwo. Final step we run llvm-dwp which reads all of this in, then writes out again. With bolt supporting dwp as output we can short circuit running llvm-dwp and just write it out directly.
Supporting DWP as input, is mainly just to make it feature complete. It was a low hanging fruit.

Is there no chance of the compiler knowing ahead of time what the cut-points are and emitting DWARF that already accounts for that?

If it's only about ranges - yeah, we could have a mode where /everything/ is described by a range, though Split DWARF doesn't provide a way for DIEs in the dwo to references ranges in the linked binary (they reference .debug_rnglists.dwo instead) - even if those ranges used startx_endx encodings, you couldn't change the number of ranges needed to describe something (so you couldn't split a contiguous sequence into a range by only modifying the linked debug info, without touching the .dwo/dwp). So, yeah, without something more like Propeller, where the cut-points are known at compile-time, modifying dwo/dwp seems necessary.

Thanks for walking me through it again.

Perhaps this should be in a subdirectory of "DebugInfo" rather than in its own top level directory? Eh, I guess if DWARFLinker is in its own top-level directory then so could this.

Could you split the change up a bit - do some/all of the library refactoring in-place before moving directories? It'd make it easier to separate/undertsand the changes.

Bolt introduces new ranges by splitting hot/cold sections, etc.
Side note. Currently BOLT only supports DWARF 4. I know GCC turned on DWARF 5 by default. Are there plans on llvm side?

Haven't heard any particular discussions - Google switched to DWARFv5 by default several months ago, haven't heard Sony or Apple discussing it.

llvm/tools/llvm-dwp/CMakeLists.txt
8

I'd probably call this DWP.cpp to match DWP.h?

ayermolo updated this revision to Diff 359898.Jul 19 2021, 1:40 PM

change to DWP.cpp

dblaikie accepted this revision.Jul 19 2021, 2:13 PM

Sounds good

This revision is now accepted and ready to land.Jul 19 2021, 2:13 PM
ayermolo updated this revision to Diff 359940.Jul 19 2021, 3:18 PM

fix clang-tidy

ayermolo updated this revision to Diff 359982.Jul 19 2021, 6:37 PM

rebase on tot

ayermolo updated this revision to Diff 360187.Jul 20 2021, 10:08 AM

rebase on latest

@dblaikie can I land or should I wait for @jdoerfert
Looking at failures, they look unrelated

@dblaikie can I land or should I wait for @jdoerfert
Looking at failures, they look unrelated

nah, that's fine - you can go ahead. Please update the patch description/commit message to reflect that this patch isn't creating a library, but is the preliminary changes within the llvm-dwp binary.

ayermolo edited the summary of this revision. (Show Details)Jul 20 2021, 5:10 PM
This revision was landed with ongoing or failed builds.Jul 20 2021, 5:19 PM
This revision was automatically updated to reflect the committed changes.