Page MenuHomePhabricator

[X86][FastISEL] Support DW_TAG_call_site_parameter with FastISEL
Needs ReviewPublic

Authored by alok on Mar 23 2021, 2:21 AM.

Details

Summary

DW_OP_entry_value and DW_TAG_call_site_parameter work together to preserve optimized out paramters value for debugging purpose.
Currently DW_OP_entry_value is generated at all the options but DW_TAG_call_site_parameter is not generated with FastISEL.
This causes debuggers to not be able to show optimized out parameters value with FastISEL.

Adding support to it now.

Diff Detail

Event Timeline

alok created this revision.Mar 23 2021, 2:21 AM
alok requested review of this revision.Mar 23 2021, 2:21 AM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 23 2021, 2:21 AM

Cool -- thanks for working on this!

clang/lib/Frontend/CompilerInvocation.cpp
1645 ↗(On Diff #332572)

I think this should be a different patch.

Are you saying this is useful for some -O0 cases? Can you please provide a test case? Is this related to Fortran only?

alok added inline comments.Mar 23 2021, 10:16 PM
clang/lib/Frontend/CompilerInvocation.cpp
1645 ↗(On Diff #332572)

Sure. I shall move this in separate patch.

Yes this is useful in -O0 case. the testcase is added with the current patch as "callsitepar-fastisel.ll". Though the IR is generated from fortran program (mentioned in test case). But I think it is not specific to fortran. Since even with -O0 many optimizations work, it is very much possible that parameter gets optimized out.

alok edited the summary of this revision. (Show Details)Mar 23 2021, 10:17 PM
alok updated this revision to Diff 332867.Mar 23 2021, 10:20 PM

Updated to incorporate comments from @djtodoro

  • to split out separate patch for clang change.

and fixed clang-format issue.

alok updated this revision to Diff 334636.Apr 1 2021, 3:27 AM

Re-based and minor changes.

FastISel is normally used only at -O0, I wouldn't expect any parameters to be optimized out at -O0.
The test is running llc with default optimization, which is -O2, and forcing fast-isel; this is not a usual combination and I wouldn't expect us to spend any effort making sure debug info works well with that combination.

If flang is forcing fast-isel even with optimization, that sounds like a problem for flang to solve, not LLVM.

alok updated this revision to Diff 334931.Apr 2 2021, 3:07 AM

Updated testcase to use option "-O0", earlier default option was being used instead.

alok added a comment.Apr 2 2021, 3:16 AM

FastISel is normally used only at -O0, I wouldn't expect any parameters to be optimized out at -O0.
The test is running llc with default optimization, which is -O2, and forcing fast-isel; this is not a usual combination and I wouldn't expect us to spend any effort making sure debug info works well with that combination.

If flang is forcing fast-isel even with optimization, that sounds like a problem for flang to solve, not LLVM.

Thanks for your comments. I am sorry as the earlier used options in test case were more confusing than helpful. I have changed that now.
The updated testcase at "-O0" with FastISel does have optimized out parameter, this is because generation of "DW_OP_GNU_entry_value" is enabled starting from "-O0". This causes llvm to generate it whenever it gets opportunity (more local variables to preserve may be the reason).
The concern here is only DW_OP_GNU_entry_value is not sufficient and should be supported with "DW_TAG_GNU_call_site_parameter", which I tried in current patch.

FastISel is normally used only at -O0, I wouldn't expect any parameters to be optimized out at -O0.
The test is running llc with default optimization, which is -O2, and forcing fast-isel; this is not a usual combination and I wouldn't expect us to spend any effort making sure debug info works well with that combination.

If flang is forcing fast-isel even with optimization, that sounds like a problem for flang to solve, not LLVM.

Thanks for your comments. I am sorry as the earlier used options in test case were more confusing than helpful. I have changed that now.
The updated testcase at "-O0" with FastISel does have optimized out parameter, this is because generation of "DW_OP_GNU_entry_value" is enabled starting from "-O0". This causes llvm to generate it whenever it gets opportunity (more local variables to preserve may be the reason).
The concern here is only DW_OP_GNU_entry_value is not sufficient and should be supported with "DW_TAG_GNU_call_site_parameter", which I tried in current patch.

"DW_OP_GNU_entry_value" is enabled starting from "-O0".

I am wondering if this is true...

alok added a comment.Apr 2 2021, 4:29 AM

FastISel is normally used only at -O0, I wouldn't expect any parameters to be optimized out at -O0.
The test is running llc with default optimization, which is -O2, and forcing fast-isel; this is not a usual combination and I wouldn't expect us to spend any effort making sure debug info works well with that combination.

If flang is forcing fast-isel even with optimization, that sounds like a problem for flang to solve, not LLVM.

Thanks for your comments. I am sorry as the earlier used options in test case were more confusing than helpful. I have changed that now.
The updated testcase at "-O0" with FastISel does have optimized out parameter, this is because generation of "DW_OP_GNU_entry_value" is enabled starting from "-O0". This causes llvm to generate it whenever it gets opportunity (more local variables to preserve may be the reason).
The concern here is only DW_OP_GNU_entry_value is not sufficient and should be supported with "DW_TAG_GNU_call_site_parameter", which I tried in current patch.

"DW_OP_GNU_entry_value" is enabled starting from "-O0".

I am wondering if this is true...

You can try with testcase attached with this patch.

bash$ llc -O0 -filetype=obj llvm/test/DebugInfo/X86/callsitepar-fastisel.ll -o callsitepar-fastisel.o
bash$ llvm-dwarfdump callsitepar-fastisel.o | grep DW_OP_GNU_entry_value

[0x00000000000000bf, 0x0000000000000101): DW_OP_GNU_entry_value(DW_OP_reg4 RSI))

FastISel is normally used only at -O0, I wouldn't expect any parameters to be optimized out at -O0.
The test is running llc with default optimization, which is -O2, and forcing fast-isel; this is not a usual combination and I wouldn't expect us to spend any effort making sure debug info works well with that combination.

If flang is forcing fast-isel even with optimization, that sounds like a problem for flang to solve, not LLVM.

Thanks for your comments. I am sorry as the earlier used options in test case were more confusing than helpful. I have changed that now.
The updated testcase at "-O0" with FastISel does have optimized out parameter, this is because generation of "DW_OP_GNU_entry_value" is enabled starting from "-O0". This causes llvm to generate it whenever it gets opportunity (more local variables to preserve may be the reason).
The concern here is only DW_OP_GNU_entry_value is not sufficient and should be supported with "DW_TAG_GNU_call_site_parameter", which I tried in current patch.

"DW_OP_GNU_entry_value" is enabled starting from "-O0".

I am wondering if this is true...

You can try with testcase attached with this patch.

bash$ llc -O0 -filetype=obj llvm/test/DebugInfo/X86/callsitepar-fastisel.ll -o callsitepar-fastisel.o
bash$ llvm-dwarfdump callsitepar-fastisel.o | grep DW_OP_GNU_entry_value

[0x00000000000000bf, 0x0000000000000101): DW_OP_GNU_entry_value(DW_OP_reg4 RSI))

OK, thanks! We should fix that.

I think that the Debug Entry Values feature should not be enabled by default for non optimized code, so the TargetOptions::ShouldEmitDebugEntryValues() should be patched with checking of optimization level (it should be > 0).

I think that the Debug Entry Values feature should not be enabled by default for non optimized code, so the TargetOptions::ShouldEmitDebugEntryValues() should be patched with checking of optimization level (it should be > 0).

That's currently intended to be already handled by the frontend, right? (clang only sets EnableDebugEntryValues (which ShouldEmitDebugEntryValues checks (hmm, it checks under 'or', not 'and', so I'm not sure where the "only above -O0" is implemented, but it is implemented somewhere?) if optimizations are enabled, yeah?)

Oh, is entry_values actually not conditionalized? It's only the call_site support that's currently conditionalized on "above -O0"?

Hmm - If that's the case, and we currently have some cases where entry_values are emitted at -O0, I'm not sure /not/ emitting those is the right call either. If we believe/have data to show that there are so few useful uses of entry_value at -O0 that it's not worth the DWARF size growth to put call_sites in at -O0, then I think it might still be worth leaving the entry_values in (unless they take up a bunch of extra space) for the cases of mixed optimization compilation (-O0 some code you're debugging, but building the rest with optimizations).

I think that the Debug Entry Values feature should not be enabled by default for non optimized code, so the TargetOptions::ShouldEmitDebugEntryValues() should be patched with checking of optimization level (it should be > 0).

That's currently intended to be already handled by the frontend, right? (clang only sets EnableDebugEntryValues (which ShouldEmitDebugEntryValues checks (hmm, it checks under 'or', not 'and', so I'm not sure where the "only above -O0" is implemented, but it is implemented somewhere?) if optimizations are enabled, yeah?)

Oh, is entry_values actually not conditionalized? It's only the call_site support that's currently conditionalized on "above -O0"?

Looks like there is no explicit check of optimization level (above "-O0"), neither on frontend nor backend for entry-values generation. I think it is the situation since there should not be any optimization (at least that I am aware of, in the case of C/C++) that would cause the entry-values generation...

Hmm - If that's the case, and we currently have some cases where entry_values are emitted at -O0, I'm not sure /not/ emitting those is the right call either. If we believe/have data to show that there are so few useful uses of entry_value at -O0 that it's not worth the DWARF size growth to put call_sites in at -O0, then I think it might still be worth leaving the entry_values in (unless they take up a bunch of extra space) for the cases of mixed optimization compilation (-O0 some code you're debugging, but building the rest with optimizations).

Yeah... That is valuable example... I am thinking in that direction as well, and I am closer to the enabling it for -O0 case if that is useful (and there is no dramatic cost in terms of DWARF size).

I think that the Debug Entry Values feature should not be enabled by default for non optimized code, so the TargetOptions::ShouldEmitDebugEntryValues() should be patched with checking of optimization level (it should be > 0).

That's currently intended to be already handled by the frontend, right? (clang only sets EnableDebugEntryValues (which ShouldEmitDebugEntryValues checks (hmm, it checks under 'or', not 'and', so I'm not sure where the "only above -O0" is implemented, but it is implemented somewhere?) if optimizations are enabled, yeah?)

Oh, is entry_values actually not conditionalized? It's only the call_site support that's currently conditionalized on "above -O0"?

Looks like there is no explicit check of optimization level (above "-O0"), neither on frontend nor backend for entry-values generation. I think it is the situation since there should not be any optimization (at least that I am aware of, in the case of C/C++) that would cause the entry-values generation...

Hmm - If that's the case, and we currently have some cases where entry_values are emitted at -O0, I'm not sure /not/ emitting those is the right call either. If we believe/have data to show that there are so few useful uses of entry_value at -O0 that it's not worth the DWARF size growth to put call_sites in at -O0, then I think it might still be worth leaving the entry_values in (unless they take up a bunch of extra space) for the cases of mixed optimization compilation (-O0 some code you're debugging, but building the rest with optimizations).

Yeah... That is valuable example... I am thinking in that direction as well, and I am closer to the enabling it for -O0 case if that is useful (and there is no dramatic cost in terms of DWARF size).

Does anyone have this example (where DW_OP_entry_value is used at -O0)? It'd be great to look at it & see if it's a case of unnecessarily losing the location, or legitimately losing it and using entry_value for best-effort recovery (& then a question of whether the loss is appropriate at -O0, or if we want to pessimize -O0 further to avoid the loss).

I'd worry about turning on call_sites at -O0 - there'd be a lot more calls (especially for C++ code with lots of implicit operations), but numbers will be needed in any case, so not worth much speculation.

alok added a comment.Mon, Apr 19, 10:01 AM

I think that the Debug Entry Values feature should not be enabled by default for non optimized code, so the TargetOptions::ShouldEmitDebugEntryValues() should be patched with checking of optimization level (it should be > 0).

That's currently intended to be already handled by the frontend, right? (clang only sets EnableDebugEntryValues (which ShouldEmitDebugEntryValues checks (hmm, it checks under 'or', not 'and', so I'm not sure where the "only above -O0" is implemented, but it is implemented somewhere?) if optimizations are enabled, yeah?)

Oh, is entry_values actually not conditionalized? It's only the call_site support that's currently conditionalized on "above -O0"?

Looks like there is no explicit check of optimization level (above "-O0"), neither on frontend nor backend for entry-values generation. I think it is the situation since there should not be any optimization (at least that I am aware of, in the case of C/C++) that would cause the entry-values generation...

Hmm - If that's the case, and we currently have some cases where entry_values are emitted at -O0, I'm not sure /not/ emitting those is the right call either. If we believe/have data to show that there are so few useful uses of entry_value at -O0 that it's not worth the DWARF size growth to put call_sites in at -O0, then I think it might still be worth leaving the entry_values in (unless they take up a bunch of extra space) for the cases of mixed optimization compilation (-O0 some code you're debugging, but building the rest with optimizations).

Yeah... That is valuable example... I am thinking in that direction as well, and I am closer to the enabling it for -O0 case if that is useful (and there is no dramatic cost in terms of DWARF size).

Does anyone have this example (where DW_OP_entry_value is used at -O0)? It'd be great to look at it & see if it's a case of unnecessarily losing the location, or legitimately losing it and using entry_value for best-effort recovery (& then a question of whether the loss is appropriate at -O0, or if we want to pessimize -O0 further to avoid the loss).

I'd worry about turning on call_sites at -O0 - there'd be a lot more calls (especially for C++ code with lots of implicit operations), but numbers will be needed in any case, so not worth much speculation.

Sorry for late response.
I tried building https://github.com/flang-compiler/classic-flang-llvm-project.git (branch release_11x) with compiler (current patch (https://reviews.llvm.org/D99160) and https://reviews.llvm.org/D99238) with -O0 -g .
Interestingly there was no difference.
Reason: https://reviews.llvm.org/D99238 is not sufficient for clang/clang++ to enable call-site generation with FastISel though it is sufficient for Flang compiler.
Below additional patch is needed to generate call-sites

`````````````````````````````````````

diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index a77f52bd235b..8d4e11faa018 100644

  • a/clang/lib/CodeGen/CGDebugInfo.cpp

+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -5149,9 +5149,9 @@ llvm::DebugLoc CGDebugInfo::SourceLocToDebugLoc(SourceLocation Loc) {
}

llvm::DINode::DIFlags CGDebugInfo::getCallSiteRelatedAttrs() const {

  • // Call site-related attributes are only useful in optimized programs, and
  • // when there's a possibility of debugging backtraces.
  • if (!CGM.getLangOpts().Optimize || DebugKind == codegenoptions::NoDebugInfo ||

+ Call site-related attributes are useful when there's a possibility of
+
debugging backtraces.
+ if (DebugKind == codegenoptions::NoDebugInfo ||

  DebugKind == codegenoptions::LocTrackingOnly)
return llvm::DINode::FlagZero;
`````````````````````````````````````

With this patch Clang/Clang++ turn on LLVM IR flag "DIFlagAllCallsDescribed", in absence of this LLVM does not generate call-site.

With the above patch applied below is the comparison of sizes of shared libraries.

Name of shared library - Size (without callsite) - Size (with callsites) - % incresase in size

```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````

PipSqueak.so 73192 75048 2%
SecondLib.so 73192 75048 2%
TestPlugin.so 1694024 1700704 0%
libLLVMDlltoolDriver.so 336568 347872 3%
libLLVMDebugInfoPDB.so 14463832 15360784 6%
libLLVMOrcError.so 108880 111184 2%
libLLVMTarget.so 2645104 2677448 1%
libLLVMFrontendOpenMP.so 2354728 2505232 6%
libLLVMProfileData.so 7901688 8373632 5%
libLLVMOrcJIT.so 28838432 30490640 5%
libLLVMRemarks.so 3311680 3551856 7%
libgtest.so 2374120 2523112 6%
libLLVMDemangle.so 1350616 1490832 10%
libLLVMAsmParser.so 6961216 7366040 5%
LLVMHello.so 394624 396120 0%
libLLVMGlobalISel.so 18886648 19698008 4%
libLLVMDebugInfoMSF.so 1288376 1365040 5%
libLLVMCoverage.so 4225224 4502104 6%
libLLVMFuzzMutate.so 3859968 3973808 2%
libRemarks.so 6696 6696 0%
libLLVMDebugInfoDWARF.so 11914848 12750784 7%
libLLVMMCParser.so 6419464 6873000 7%
libLLVMTableGen.so 4855536 5180760 6%
libLLVMDWARFLinker.so 5407528 5628576 4%
Bye.so 1858848 1872672 0%
libLLVMMCJIT.so 1470952 1526544 3%
libLLVMMC.so 16931504 17741376 4%
libLLVMipo.so 43554712 46019392 5%
libLLVMLineEditor.so 208216 216360 3%
libbenchmark_main.so 18904 19408 2%
libbenchmark.so 3308304 3507632 6%
libLTO.so 2240720 2277408 1%
libLLVMInterpreter.so 2614696 2749128 5%
libLLVMTransformUtils.so 47925248 50476512 5%
libLLVMX86Desc.so 8047928 8213152 2%
libLLVMCoroutines.so 6478080 6766880 4%
libLLVMJITLink.so 5590936 6066736 8%
libLLVMVectorize.so 19557808 20665544 5%
libLLVMX86Disassembler.so 2820056 2849376 1%
libLLVMBitReader.so 8282648 8823240 6%
libLLVMMCA.so 3242016 3405624 5%
libLLVMBitWriter.so 6544032 6867976 4%
libLLVMMIRParser.so 5739688 5980104 4%
libLLVMLTO.so 13272272 13786192 3%
libLLVMCore.so 46109224 48840008 5%
libLLVMBitstreamReader.so 561896 600624 6%
libLLVMObjectYAML.so 23110160 24648160 6%
libLLVMSupport.so 20349728 21953112 7%
libLLVMIRReader.so 1215672 1237960 1%
libLLVMX86Info.so 76488 76624 0%
libLLVMSelectionDAG.so 34358968 36876128 7%
libLLVMExecutionEngine.so 2962160 3073224 3%
libLLVMSymbolize.so 1980760 2089728 5%
libLLVMPasses.so 18574960 19459712 4%
libLLVMOption.so 869784 920976 5%
libLLVMObject.so 15138656 16383168 8%
libLLVMTextAPI.so 3191272 3384560 6%
libLLVMX86CodeGen.so 55202744 58068088 5%
libLLVMAggressiveInstCombine.so 2354936 2419048 2%
libLLVMExtensions.so 23904 23904 0%
libLLVMWindowsManifest.so 351608 370168 5%
libLLVMObjCARCOpts.so 4964488 5150120 3%
libLLVMBinaryFormat.so 1325752 1431448 7%
libLLVMDebugInfoGSYM.so 2909272 3094616 6%
libLLVMTestingSupport.so 623608 656640 5%
libgtest_main.so 42856 43608 1%
libLLVMLinker.so 3070264 3210248 4%
libLLVMCFGuard.so 1082488 1096104 1%
libLLVMCodeGen.so 139859816 146452160 4%
libLLVMDebugInfoCodeView.so 7742896 8497704 9%
libLLVMX86AsmParser.so 2590816 2714008 4%
libLLVMRuntimeDyld.so 6592016 7094048 7%
libLLVMInstCombine.so 18194728 19705184 8%
BugpointPasses.so 780112 788304 1%
libLLVMScalarOpts.so 73110536 77534912 6%
libLLVMXRay.so 3993696 4251280 6%
libLLVMMCDisassembler.so 1292912 1313536 1%
libLLVMFrontendOpenACC.so 96008 102656 6%
libLLVMInstrumentation.so 21038808 22351384 6%
libLLVMLibDriver.so 1595936 1638704 2%
libLLVMAsmPrinter.so 25818000 26842816 3%
libLLVMAnalysis.so 79615056 83995856 5%

`````````````````````````````````````````````````````````````````````

sum 79615057 83995857 5%

So the conclusion is,

  • We have a Flag which can help us if we want to enable callsite generation selectively (only for Flang)
  • If we are fine with 5% increase in size, we can enable call-site generation by default.

I think that the Debug Entry Values feature should not be enabled by default for non optimized code, so the TargetOptions::ShouldEmitDebugEntryValues() should be patched with checking of optimization level (it should be > 0).

That's currently intended to be already handled by the frontend, right? (clang only sets EnableDebugEntryValues (which ShouldEmitDebugEntryValues checks (hmm, it checks under 'or', not 'and', so I'm not sure where the "only above -O0" is implemented, but it is implemented somewhere?) if optimizations are enabled, yeah?)

Oh, is entry_values actually not conditionalized? It's only the call_site support that's currently conditionalized on "above -O0"?

Looks like there is no explicit check of optimization level (above "-O0"), neither on frontend nor backend for entry-values generation. I think it is the situation since there should not be any optimization (at least that I am aware of, in the case of C/C++) that would cause the entry-values generation...

Hmm - If that's the case, and we currently have some cases where entry_values are emitted at -O0, I'm not sure /not/ emitting those is the right call either. If we believe/have data to show that there are so few useful uses of entry_value at -O0 that it's not worth the DWARF size growth to put call_sites in at -O0, then I think it might still be worth leaving the entry_values in (unless they take up a bunch of extra space) for the cases of mixed optimization compilation (-O0 some code you're debugging, but building the rest with optimizations).

Yeah... That is valuable example... I am thinking in that direction as well, and I am closer to the enabling it for -O0 case if that is useful (and there is no dramatic cost in terms of DWARF size).

Does anyone have this example (where DW_OP_entry_value is used at -O0)? It'd be great to look at it & see if it's a case of unnecessarily losing the location, or legitimately losing it and using entry_value for best-effort recovery (& then a question of whether the loss is appropriate at -O0, or if we want to pessimize -O0 further to avoid the loss).

I think this ^ still needs understanding/investigation. Do you have an example with OP_entry_value at -O0?

I'd worry about turning on call_sites at -O0 - there'd be a lot more calls (especially for C++ code with lots of implicit operations), but numbers will be needed in any case, so not worth much speculation.

Sorry for late response.
I tried building https://github.com/flang-compiler/classic-flang-llvm-project.git (branch release_11x) with compiler (current patch (https://reviews.llvm.org/D99160) and https://reviews.llvm.org/D99238) with -O0 -g .
Interestingly there was no difference.
Reason: https://reviews.llvm.org/D99238 is not sufficient for clang/clang++ to enable call-site generation with FastISel though it is sufficient for Flang compiler.
Below additional patch is needed to generate call-sites

`````````````````````````````````````

diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index a77f52bd235b..8d4e11faa018 100644

  • a/clang/lib/CodeGen/CGDebugInfo.cpp

+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -5149,9 +5149,9 @@ llvm::DebugLoc CGDebugInfo::SourceLocToDebugLoc(SourceLocation Loc) {
}

llvm::DINode::DIFlags CGDebugInfo::getCallSiteRelatedAttrs() const {

  • // Call site-related attributes are only useful in optimized programs, and
  • // when there's a possibility of debugging backtraces.
  • if (!CGM.getLangOpts().Optimize || DebugKind == codegenoptions::NoDebugInfo ||

+ Call site-related attributes are useful when there's a possibility of
+
debugging backtraces.
+ if (DebugKind == codegenoptions::NoDebugInfo ||

  DebugKind == codegenoptions::LocTrackingOnly)
return llvm::DINode::FlagZero;
`````````````````````````````````````

With this patch Clang/Clang++ turn on LLVM IR flag "DIFlagAllCallsDescribed", in absence of this LLVM does not generate call-site.

With the above patch applied below is the comparison of sizes of shared libraries.

Name of shared library - Size (without callsite) - Size (with callsites) - % incresase in size

```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````

PipSqueak.so 73192 75048 2%
SecondLib.so 73192 75048 2%
TestPlugin.so 1694024 1700704 0%
libLLVMDlltoolDriver.so 336568 347872 3%
libLLVMDebugInfoPDB.so 14463832 15360784 6%
libLLVMOrcError.so 108880 111184 2%
libLLVMTarget.so 2645104 2677448 1%
libLLVMFrontendOpenMP.so 2354728 2505232 6%
libLLVMProfileData.so 7901688 8373632 5%
libLLVMOrcJIT.so 28838432 30490640 5%
libLLVMRemarks.so 3311680 3551856 7%
libgtest.so 2374120 2523112 6%
libLLVMDemangle.so 1350616 1490832 10%
libLLVMAsmParser.so 6961216 7366040 5%
LLVMHello.so 394624 396120 0%
libLLVMGlobalISel.so 18886648 19698008 4%
libLLVMDebugInfoMSF.so 1288376 1365040 5%
libLLVMCoverage.so 4225224 4502104 6%
libLLVMFuzzMutate.so 3859968 3973808 2%
libRemarks.so 6696 6696 0%
libLLVMDebugInfoDWARF.so 11914848 12750784 7%
libLLVMMCParser.so 6419464 6873000 7%
libLLVMTableGen.so 4855536 5180760 6%
libLLVMDWARFLinker.so 5407528 5628576 4%
Bye.so 1858848 1872672 0%
libLLVMMCJIT.so 1470952 1526544 3%
libLLVMMC.so 16931504 17741376 4%
libLLVMipo.so 43554712 46019392 5%
libLLVMLineEditor.so 208216 216360 3%
libbenchmark_main.so 18904 19408 2%
libbenchmark.so 3308304 3507632 6%
libLTO.so 2240720 2277408 1%
libLLVMInterpreter.so 2614696 2749128 5%
libLLVMTransformUtils.so 47925248 50476512 5%
libLLVMX86Desc.so 8047928 8213152 2%
libLLVMCoroutines.so 6478080 6766880 4%
libLLVMJITLink.so 5590936 6066736 8%
libLLVMVectorize.so 19557808 20665544 5%
libLLVMX86Disassembler.so 2820056 2849376 1%
libLLVMBitReader.so 8282648 8823240 6%
libLLVMMCA.so 3242016 3405624 5%
libLLVMBitWriter.so 6544032 6867976 4%
libLLVMMIRParser.so 5739688 5980104 4%
libLLVMLTO.so 13272272 13786192 3%
libLLVMCore.so 46109224 48840008 5%
libLLVMBitstreamReader.so 561896 600624 6%
libLLVMObjectYAML.so 23110160 24648160 6%
libLLVMSupport.so 20349728 21953112 7%
libLLVMIRReader.so 1215672 1237960 1%
libLLVMX86Info.so 76488 76624 0%
libLLVMSelectionDAG.so 34358968 36876128 7%
libLLVMExecutionEngine.so 2962160 3073224 3%
libLLVMSymbolize.so 1980760 2089728 5%
libLLVMPasses.so 18574960 19459712 4%
libLLVMOption.so 869784 920976 5%
libLLVMObject.so 15138656 16383168 8%
libLLVMTextAPI.so 3191272 3384560 6%
libLLVMX86CodeGen.so 55202744 58068088 5%
libLLVMAggressiveInstCombine.so 2354936 2419048 2%
libLLVMExtensions.so 23904 23904 0%
libLLVMWindowsManifest.so 351608 370168 5%
libLLVMObjCARCOpts.so 4964488 5150120 3%
libLLVMBinaryFormat.so 1325752 1431448 7%
libLLVMDebugInfoGSYM.so 2909272 3094616 6%
libLLVMTestingSupport.so 623608 656640 5%
libgtest_main.so 42856 43608 1%
libLLVMLinker.so 3070264 3210248 4%
libLLVMCFGuard.so 1082488 1096104 1%
libLLVMCodeGen.so 139859816 146452160 4%
libLLVMDebugInfoCodeView.so 7742896 8497704 9%
libLLVMX86AsmParser.so 2590816 2714008 4%
libLLVMRuntimeDyld.so 6592016 7094048 7%
libLLVMInstCombine.so 18194728 19705184 8%
BugpointPasses.so 780112 788304 1%
libLLVMScalarOpts.so 73110536 77534912 6%
libLLVMXRay.so 3993696 4251280 6%
libLLVMMCDisassembler.so 1292912 1313536 1%
libLLVMFrontendOpenACC.so 96008 102656 6%
libLLVMInstrumentation.so 21038808 22351384 6%
libLLVMLibDriver.so 1595936 1638704 2%
libLLVMAsmPrinter.so 25818000 26842816 3%
libLLVMAnalysis.so 79615056 83995856 5%

`````````````````````````````````````````````````````````````````````

sum 79615057 83995857 5%

So the conclusion is,

  • We have a Flag which can help us if we want to enable callsite generation selectively (only for Flang)

It doesn't seem to me, so far, like this is a place where Flang and Clang should diverge - they're both doing the same sort of thing for the same reasons/likely with the same sort of tradeoffs of location accuracy V size cost.

  • If we are fine with 5% increase in size, we can enable call-site generation by default.

I'd actually be somewhat more worried about object size, rather than/in addition to executable size, due to the increase in relocations (especially with DWARFv5 (& especially with the -mllvm -minimize-addr-in-v5=Ranges which further reduces debug_addr, but can't handle labels and call sites (unlike -minimize-addr-in-v5=Expressions or -minimize-addr-in-v5=Form)), which does a lot to reduce the number of relocations by using debug_addr and address sharing on debug_rnglists/loclists/etc) which can't be compressed, etc.

alok added a comment.Mon, Apr 19, 9:25 PM

I think that the Debug Entry Values feature should not be enabled by default for non optimized code, so the TargetOptions::ShouldEmitDebugEntryValues() should be patched with checking of optimization level (it should be > 0).

That's currently intended to be already handled by the frontend, right? (clang only sets EnableDebugEntryValues (which ShouldEmitDebugEntryValues checks (hmm, it checks under 'or', not 'and', so I'm not sure where the "only above -O0" is implemented, but it is implemented somewhere?) if optimizations are enabled, yeah?)

Oh, is entry_values actually not conditionalized? It's only the call_site support that's currently conditionalized on "above -O0"?

Looks like there is no explicit check of optimization level (above "-O0"), neither on frontend nor backend for entry-values generation. I think it is the situation since there should not be any optimization (at least that I am aware of, in the case of C/C++) that would cause the entry-values generation...

Hmm - If that's the case, and we currently have some cases where entry_values are emitted at -O0, I'm not sure /not/ emitting those is the right call either. If we believe/have data to show that there are so few useful uses of entry_value at -O0 that it's not worth the DWARF size growth to put call_sites in at -O0, then I think it might still be worth leaving the entry_values in (unless they take up a bunch of extra space) for the cases of mixed optimization compilation (-O0 some code you're debugging, but building the rest with optimizations).

Yeah... That is valuable example... I am thinking in that direction as well, and I am closer to the enabling it for -O0 case if that is useful (and there is no dramatic cost in terms of DWARF size).

Does anyone have this example (where DW_OP_entry_value is used at -O0)? It'd be great to look at it & see if it's a case of unnecessarily losing the location, or legitimately losing it and using entry_value for best-effort recovery (& then a question of whether the loss is appropriate at -O0, or if we want to pessimize -O0 further to avoid the loss).

I think this ^ still needs understanding/investigation. Do you have an example with OP_entry_value at -O0?

I'd worry about turning on call_sites at -O0 - there'd be a lot more calls (especially for C++ code with lots of implicit operations), but numbers will be needed in any case, so not worth much speculation.

Sorry for late response.
I tried building https://github.com/flang-compiler/classic-flang-llvm-project.git (branch release_11x) with compiler (current patch (https://reviews.llvm.org/D99160) and https://reviews.llvm.org/D99238) with -O0 -g .
Interestingly there was no difference.
Reason: https://reviews.llvm.org/D99238 is not sufficient for clang/clang++ to enable call-site generation with FastISel though it is sufficient for Flang compiler.
Below additional patch is needed to generate call-sites

`````````````````````````````````````

diff --git a/clang/lib/CodeGen/CGDebugInfo.cpp b/clang/lib/CodeGen/CGDebugInfo.cpp
index a77f52bd235b..8d4e11faa018 100644

  • a/clang/lib/CodeGen/CGDebugInfo.cpp

+++ b/clang/lib/CodeGen/CGDebugInfo.cpp
@@ -5149,9 +5149,9 @@ llvm::DebugLoc CGDebugInfo::SourceLocToDebugLoc(SourceLocation Loc) {
}

llvm::DINode::DIFlags CGDebugInfo::getCallSiteRelatedAttrs() const {

  • // Call site-related attributes are only useful in optimized programs, and
  • // when there's a possibility of debugging backtraces.
  • if (!CGM.getLangOpts().Optimize || DebugKind == codegenoptions::NoDebugInfo ||

+ Call site-related attributes are useful when there's a possibility of
+
debugging backtraces.
+ if (DebugKind == codegenoptions::NoDebugInfo ||

  DebugKind == codegenoptions::LocTrackingOnly)
return llvm::DINode::FlagZero;
`````````````````````````````````````

With this patch Clang/Clang++ turn on LLVM IR flag "DIFlagAllCallsDescribed", in absence of this LLVM does not generate call-site.

With the above patch applied below is the comparison of sizes of shared libraries.

Name of shared library - Size (without callsite) - Size (with callsites) - % incresase in size

```````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````````

PipSqueak.so 73192 75048 2%
SecondLib.so 73192 75048 2%
TestPlugin.so 1694024 1700704 0%
libLLVMDlltoolDriver.so 336568 347872 3%
libLLVMDebugInfoPDB.so 14463832 15360784 6%
libLLVMOrcError.so 108880 111184 2%
libLLVMTarget.so 2645104 2677448 1%
libLLVMFrontendOpenMP.so 2354728 2505232 6%
libLLVMProfileData.so 7901688 8373632 5%
libLLVMOrcJIT.so 28838432 30490640 5%
libLLVMRemarks.so 3311680 3551856 7%
libgtest.so 2374120 2523112 6%
libLLVMDemangle.so 1350616 1490832 10%
libLLVMAsmParser.so 6961216 7366040 5%
LLVMHello.so 394624 396120 0%
libLLVMGlobalISel.so 18886648 19698008 4%
libLLVMDebugInfoMSF.so 1288376 1365040 5%
libLLVMCoverage.so 4225224 4502104 6%
libLLVMFuzzMutate.so 3859968 3973808 2%
libRemarks.so 6696 6696 0%
libLLVMDebugInfoDWARF.so 11914848 12750784 7%
libLLVMMCParser.so 6419464 6873000 7%
libLLVMTableGen.so 4855536 5180760 6%
libLLVMDWARFLinker.so 5407528 5628576 4%
Bye.so 1858848 1872672 0%
libLLVMMCJIT.so 1470952 1526544 3%
libLLVMMC.so 16931504 17741376 4%
libLLVMipo.so 43554712 46019392 5%
libLLVMLineEditor.so 208216 216360 3%
libbenchmark_main.so 18904 19408 2%
libbenchmark.so 3308304 3507632 6%
libLTO.so 2240720 2277408 1%
libLLVMInterpreter.so 2614696 2749128 5%
libLLVMTransformUtils.so 47925248 50476512 5%
libLLVMX86Desc.so 8047928 8213152 2%
libLLVMCoroutines.so 6478080 6766880 4%
libLLVMJITLink.so 5590936 6066736 8%
libLLVMVectorize.so 19557808 20665544 5%
libLLVMX86Disassembler.so 2820056 2849376 1%
libLLVMBitReader.so 8282648 8823240 6%
libLLVMMCA.so 3242016 3405624 5%
libLLVMBitWriter.so 6544032 6867976 4%
libLLVMMIRParser.so 5739688 5980104 4%
libLLVMLTO.so 13272272 13786192 3%
libLLVMCore.so 46109224 48840008 5%
libLLVMBitstreamReader.so 561896 600624 6%
libLLVMObjectYAML.so 23110160 24648160 6%
libLLVMSupport.so 20349728 21953112 7%
libLLVMIRReader.so 1215672 1237960 1%
libLLVMX86Info.so 76488 76624 0%
libLLVMSelectionDAG.so 34358968 36876128 7%
libLLVMExecutionEngine.so 2962160 3073224 3%
libLLVMSymbolize.so 1980760 2089728 5%
libLLVMPasses.so 18574960 19459712 4%
libLLVMOption.so 869784 920976 5%
libLLVMObject.so 15138656 16383168 8%
libLLVMTextAPI.so 3191272 3384560 6%
libLLVMX86CodeGen.so 55202744 58068088 5%
libLLVMAggressiveInstCombine.so 2354936 2419048 2%
libLLVMExtensions.so 23904 23904 0%
libLLVMWindowsManifest.so 351608 370168 5%
libLLVMObjCARCOpts.so 4964488 5150120 3%
libLLVMBinaryFormat.so 1325752 1431448 7%
libLLVMDebugInfoGSYM.so 2909272 3094616 6%
libLLVMTestingSupport.so 623608 656640 5%
libgtest_main.so 42856 43608 1%
libLLVMLinker.so 3070264 3210248 4%
libLLVMCFGuard.so 1082488 1096104 1%
libLLVMCodeGen.so 139859816 146452160 4%
libLLVMDebugInfoCodeView.so 7742896 8497704 9%
libLLVMX86AsmParser.so 2590816 2714008 4%
libLLVMRuntimeDyld.so 6592016 7094048 7%
libLLVMInstCombine.so 18194728 19705184 8%
BugpointPasses.so 780112 788304 1%
libLLVMScalarOpts.so 73110536 77534912 6%
libLLVMXRay.so 3993696 4251280 6%
libLLVMMCDisassembler.so 1292912 1313536 1%
libLLVMFrontendOpenACC.so 96008 102656 6%
libLLVMInstrumentation.so 21038808 22351384 6%
libLLVMLibDriver.so 1595936 1638704 2%
libLLVMAsmPrinter.so 25818000 26842816 3%
libLLVMAnalysis.so 79615056 83995856 5%

`````````````````````````````````````````````````````````````````````

sum 79615057 83995857 5%

So the conclusion is,

  • We have a Flag which can help us if we want to enable callsite generation selectively (only for Flang)

It doesn't seem to me, so far, like this is a place where Flang and Clang should diverge - they're both doing the same sort of thing for the same reasons/likely with the same sort of tradeoffs of location accuracy V size cost.

  • If we are fine with 5% increase in size, we can enable call-site generation by default.

I'd actually be somewhat more worried about object size, rather than/in addition to executable size, due to the increase in relocations (especially with DWARFv5 (& especially with the -mllvm -minimize-addr-in-v5=Ranges which further reduces debug_addr, but can't handle labels and call sites (unlike -minimize-addr-in-v5=Expressions or -minimize-addr-in-v5=Form)), which does a lot to reduce the number of relocations by using debug_addr and address sharing on debug_rnglists/loclists/etc) which can't be compressed, etc.

Below is the comparison of total size of object files (*.o) for same build (default (O0) + -g).

sum 3535044960 3680496672 4%

I shall come back with numbers for options O0 -gdwarf-5 -mllvm -minimize-addr-in-v5=Ranges .