This is an archive of the discontinued LLVM Phabricator instance.

[Debugify] Limit number of processed functions for original mode
ClosedPublic

Authored by ntesic on Dec 14 2021, 1:42 AM.

Details

Summary

Debugify in OriginalDebugInfo mode, does (DebugInfo) collect-before-pass & check-after-pass
for each instruction, which is pretty expensive. When used to analyze DebugInfo losses
in large projects (like LLVM), this raises the build time unacceptably.
This patch introduces a limit for the number of processed functions per compile unit.
By default, the limit is set to UINT_MAX (practically unlimited), and by using the introduced
option -debugify-func-limit the limit could be set to any positive integer number.

Diff Detail

Event Timeline

ntesic created this revision.Dec 14 2021, 1:42 AM
ntesic requested review of this revision.Dec 14 2021, 1:42 AM
Herald added a project: Restricted Project. · View Herald TranscriptDec 14 2021, 1:42 AM

Performance seems like a serious issue when working with large projects here, but I have some questions/thoughts about this approach:

Is there any good reason to have an option to set the limit to zero? Unless I'm missing something, that would be equivalent to just not running debugify at all, which seems like a redundant option to have when debugify is an optional flag to begin with.

More generally, it's probably good to have an optional limit for OriginalDIMode, but I think it would be better if the limit could be passed as an integer instead of an enum. While 10000 may be a good default value for the builds you mentioned in the description, this is only with respect to your hardware setup and time constraints, and even then may be different for other builds (or even different optimization pipelines). Other users may have a higher or lower limit on the number of instructions they can afford to have processed by debugify, and should be able to specify this exactly via command line.

Also this is just my opinion and so I'd like to see what other reviewers think, but personally I think it would be best if the default setting was unlimited (so the limit is purely optional). When we're talking about metric-gathering, I think it's best to have the program do exactly what it says on the tin/what the user would expect it to do, and silently skipping instructions unless you pass an additional flag seems like potentially harmful unexpected behaviour. The documentation should be updated to include this option so that if a user does encounter performance issues then they will know that the option is there, but they will never unknowingly get incomplete results because they didn't know about this flag.

Hi @StephenTozer, sorry for the delay, I was AFK.
Thanks for your comments.

Performance seems like a serious issue when working with large projects here, but I have some questions/thoughts about this approach:

Definitely, and large projects are precious to us, for catching Debug Info Losses in the LLVM pipeline.

Is there any good reason to have an option to set the limit to zero? Unless I'm missing something, that would be equivalent to just not running debugify at all, which seems like a redundant option to have when debugify is an optional flag to begin with.

The zero limit was useful for comparing build times during development of these patches, but I agree that it is not very useful for users.

More generally, it's probably good to have an optional limit for OriginalDIMode, but I think it would be better if the limit could be passed as an integer instead of an enum. While 10000 may be a good default value for the builds you mentioned in the description, this is only with respect to your hardware setup and time constraints, and even then may be different for other builds (or even different optimization pipelines). Other users may have a higher or lower limit on the number of instructions they can afford to have processed by debugify, and should be able to specify this exactly via command line.

Explicitly setting the instruction limit number instead of using preset number, means the user should know its system constraints. This probably makes sense.

Also this is just my opinion and so I'd like to see what other reviewers think, but personally I think it would be best if the default setting was unlimited (so the limit is purely optional). When we're talking about metric-gathering, I think it's best to have the program do exactly what it says on the tin/what the user would expect it to do, and silently skipping instructions unless you pass an additional flag seems like potentially harmful unexpected behaviour. The documentation should be updated to include this option so that if a user does encounter performance issues then they will know that the option is there, but they will never unknowingly get incomplete results because they didn't know about this flag.

This patch came from our downstream use case, and in the general use case, all your comments make more sense. Thanks again!

ntesic updated this revision to Diff 418545.EditedMar 28 2022, 4:52 AM
ntesic retitled this revision from [Debugify] Limit number of processed instructions for original mode to [Debugify] Limit number of processed functions for original mode.
ntesic edited the summary of this revision. (Show Details)

Set limitation granularity to the function level instead of instruction level.

  • After latest update of D115622, we decide whether to use already collected Debug Info at the Function level, instead of the (whole) Module level. This is important, since the set of observed Functions in the same Module is not the equivalent for each pass in the LLVM pipeline. This update of the patch introduces the limit number of the observed Functions in the -verify-each-debuginfo-preserve pipeline.
  • By default, consider unlimited number of Functions
  • Set any number as a limit using the -debugify-func-limit option
  • Rebase

Sorry for the big delay, and thanks @StephenTozer for the comments!

Herald added a project: Restricted Project. · View Herald TranscriptMar 28 2022, 4:52 AM
ntesic edited the summary of this revision. (Show Details)Mar 28 2022, 7:09 AM
ntesic updated this revision to Diff 421501.Apr 8 2022, 5:36 AM
  • Add new argument usage to HowToUpdateDebugInfo documentation

Thanks @djtodoro!

ntesic updated this revision to Diff 421505.Apr 8 2022, 5:55 AM
djtodoro accepted this revision.Apr 8 2022, 5:56 AM
This revision is now accepted and ready to land.Apr 8 2022, 5:56 AM
This revision was landed with ongoing or failed builds.Apr 21 2022, 5:00 AM
This revision was automatically updated to reflect the committed changes.