This is an archive of the discontinued LLVM Phabricator instance.

[clang] report inlining decisions with -Wattribute-{warning|error}
AcceptedPublic

Authored by nickdesaulniers on Jan 10 2023, 5:44 PM.

Details

Summary

Due to inlining, descovering which specific call site to a function with
the attribute "warning" or "error" is painful.

In the IR record inlining decisions in metadata when inlining a callee
that itself contains a call to a dontcall-error or dontcall-warn fn.

Print this info so that it's clearer which call site is problematic.

There's still some limitations with this approach; macro expansion is
not recorded.

Fixes: https://github.com/ClangBuiltLinux/linux/issues/1571

Diff Detail

Event Timeline

Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 5:44 PM
nickdesaulniers requested review of this revision.Jan 10 2023, 5:44 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 10 2023, 5:44 PM
nickdesaulniers planned changes to this revision.Jan 10 2023, 5:45 PM
nickdesaulniers added a subscriber: kees.
  • update optimize test
  • use std::string, demangle C++
  • update tests
nickdesaulniers edited the summary of this revision. (Show Details)
  • remove test comment in commit message
  • add tests for alwaysinline
arsenm added inline comments.Jan 11 2023, 1:29 PM
llvm/lib/Transforms/Utils/InlineFunction.cpp
2467–2468

Misses constexpr casts and aliases

2471

metadata usually uses . word separators

llvm/test/Transforms/Inline/dontcall-attributes.ll
9

test alias case?

nickdesaulniers marked an inline comment as done.
  • replace inlined-from w/ inlined.from
  • mention inlined.from in LangRef
llvm/lib/Transforms/Utils/InlineFunction.cpp
2467–2468

The base feature doesn't work with aliases (or ConstantExpr), in GCC or Clang. I should perhaps fix that first...

hmm, did we/do we ever motivate backend diagnostics using the debug info locations? I thought we did in some cases/ways - maybe when the frontend isn't available (eg: when compiling from LLVM IR).

It'd be nice not to invent a new way of tracking inlining separate from the way debug info does this - duplicate systems with separate opportunities for bugs, etc. Any chance we can reuse the debug info inlining descriptions for this?

There are two different approaches we use for getting source locations for diagnostics coming out of the backend; either embedding clang SourceLocations into IR, or trying to translate DWARF locations. The advantage of using clang SourceLocations is that we never lose any fidelity, even in complex cases involving the preprocessor, and the compiler doesn't have to spend time generating complete DWARF info. The advantage of DWARF, of course, is that it's existing, mature infrastructure.

We use clang SourceLocations for inline asm and for the warning/error attributes, we use DWARF for optimization remarks.

There's probably some argument for using DWARF here.


Am I reading the correctly, that currently the inlining notes don't have source locations? That's not very friendly (particularly for display in IDEs).

nickdesaulniers added a comment.EditedJan 11 2023, 4:03 PM

There are two different approaches we use for getting source locations for diagnostics coming out of the backend; either embedding clang SourceLocations into IR, or trying to translate DWARF locations. The advantage of using clang SourceLocations is that we never lose any fidelity, even in complex cases involving the preprocessor, and the compiler doesn't have to spend time generating complete DWARF info. The advantage of DWARF, of course, is that it's existing, mature infrastructure.

We use clang SourceLocations for inline asm and for the warning/error attributes, we use DWARF for optimization remarks.

Yep.

There's probably some argument for using DWARF here.

Yeah, my thoughts too. Though if we don't compile with -g, does that mean if I use DILocation (or w/e metadata for "inlining occurred") that .debug_info will get emitted, even if the user didn't ask for it? If not, I can probably switch everything to use DILocation.


Am I reading the correctly, that currently the inlining notes don't have source locations? That's not very friendly (particularly for display in IDEs).

Correct; we pessimistically don't record the srcloc of every call ever. We'd need to do that to support line info on the inlining notes. That seems very expensive.

Perhaps we only show these notes if debug info is enabled (-g) where we can then use DiagnosticInfoWithLocationBase? (Do we have precendent for warnings that are better with DebugInfo enabled?)

That said, the information provided is A MASSIVE LEAP FORWARD in debugging these errors (mostly observed in FORTIFY_SOURCE implementations in the Linux kernel) by simply providing the call chain, even without line/col numbers.

Yeah, my thoughts too. Though if we don't compile with -g, does that mean if I use DILocation (or w/e metadata for "inlining occurred") that .debug_info will get emitted, even if the user didn't ask for it? If not, I can probably switch everything to use DILocation.

clang has a "LocTrackingOnly" setting for debug info, which emits DILocation info into the IR, but emits a marker into the DICompileUnit to skip emitting the .debug_info in the backend. We currently use it for -Rpass. We don't do this by default, I think to save compile time.

clang has a "LocTrackingOnly" setting for debug info, which emits DILocation info into the IR, but emits a marker into the DICompileUnit to skip emitting the .debug_info in the backend. We currently use it for -Rpass. We don't do this by default, I think to save compile time.

Specifically emissionKind: NoDebug, example:

!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 16.0.0 (git@github.com:llvm/llvm-project.git 7b433e026498cf4176931b2407baece1d5060e16)", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, splitDebugInlining: false, nameTableKind: None)

Though should the frontend be setting codegen options when parsing? Would the idea be to try to re-set OPT_debug_info_kind_EQ when clang codegen's IR for a function with such an attribute?


It'd be nice not to invent a new way of tracking inlining separate from the way debug info does this - duplicate systems with separate opportunities for bugs, etc. Any chance we can reuse the debug info inlining descriptions for this?

So it looks like we have:

!28 = !DILocation(line: 14, column: 3, scope: !8, inlinedAt: !29)

Let me see if I can create DILocation without line or column values. The DISubprogram and DILocation should form a similar chain, even if significantly more complicated to "unwind."

llvm/lib/Transforms/Utils/InlineFunction.cpp
2467–2468

Perhaps I should use Call.getCalledOperand()->stripPointerCasts() for constantexpr case.

llvm/test/Transforms/Inline/dontcall-attributes.ll
9

I actually disagree about aliases. I don't think aliases should implicitly inherit function attributes.

clang has a "LocTrackingOnly" setting for debug info, which emits DILocation info into the IR, but emits a marker into the DICompileUnit to skip emitting the .debug_info in the backend. We currently use it for -Rpass. We don't do this by default, I think to save compile time.

Specifically emissionKind: NoDebug, example:

!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 16.0.0 (git@github.com:llvm/llvm-project.git 7b433e026498cf4176931b2407baece1d5060e16)", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, splitDebugInlining: false, nameTableKind: None)

Though should the frontend be setting codegen options when parsing? Would the idea be to try to re-set OPT_debug_info_kind_EQ when clang codegen's IR for a function with such an attribute?

Probably turn on emissionKind: NoDebug whenever the warning is enabled?


It'd be nice not to invent a new way of tracking inlining separate from the way debug info does this - duplicate systems with separate opportunities for bugs, etc. Any chance we can reuse the debug info inlining descriptions for this?

So it looks like we have:

!28 = !DILocation(line: 14, column: 3, scope: !8, inlinedAt: !29)

Let me see if I can create DILocation without line or column values.

Not sure I follow - why would you want to drop line/column info? Isn't that relevant to the inlining stack - you might have multiple calls in the same function & it'd be good to know which one the diagnostic is referring to.

The DISubprogram and DILocation should form a similar chain, even if significantly more complicated to "unwind."

clang has a "LocTrackingOnly" setting for debug info, which emits DILocation info into the IR, but emits a marker into the DICompileUnit to skip emitting the .debug_info in the backend. We currently use it for -Rpass. We don't do this by default, I think to save compile time.

Specifically emissionKind: NoDebug, example:

!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 16.0.0 (git@github.com:llvm/llvm-project.git 7b433e026498cf4176931b2407baece1d5060e16)", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, splitDebugInlining: false, nameTableKind: None)

Though should the frontend be setting codegen options when parsing? Would the idea be to try to re-set OPT_debug_info_kind_EQ when clang codegen's IR for a function with such an attribute?

Probably turn on emissionKind: NoDebug whenever the warning is enabled?

So this warning is default enabled, IIRC. I guess I need to check for -Wno-attribute-warning. If -Wno-attribute-warning is not set and -g (or friends) is not set, then I should set emissionKind to NoDebug (I think).


It'd be nice not to invent a new way of tracking inlining separate from the way debug info does this - duplicate systems with separate opportunities for bugs, etc. Any chance we can reuse the debug info inlining descriptions for this?

So it looks like we have:

!28 = !DILocation(line: 14, column: 3, scope: !8, inlinedAt: !29)

Let me see if I can create DILocation without line or column values.

Not sure I follow - why would you want to drop line/column info? Isn't that relevant to the inlining stack - you might have multiple calls in the same function & it'd be good to know which one the diagnostic is referring to.

The DISubprogram and DILocation should form a similar chain, even if significantly more complicated to "unwind."

Not dropping it, more like without -g (or friends) it never existed in the first place. If you look at my change to llvm/lib/Transforms/Utils/InlineFunction.cpp (in this patch), I'm basically synthesizing metadata during inlining. If we don't have Debug Info related metadata because the program wasn't compiled with -g (or w/e), then the DILocation for the callsites was never produced by the frontend in the first place. This patch doesn't intentionally drop anything, it's more like the anything might never have existed in the first place.

So this warning could "be improved" if you recompiled with -g; I don't think there's really precedent for that and expect it's perhaps controversial. Hence my custom metadate nodes rather than the existing DILocation.

llvm/lib/Transforms/Utils/InlineFunction.cpp
2467–2468

I've added support for ConstantExpr to the base feature in D142058. I should still fix this in this PR and add a test case, so not yet marking this thread as done.

clang has a "LocTrackingOnly" setting for debug info, which emits DILocation info into the IR, but emits a marker into the DICompileUnit to skip emitting the .debug_info in the backend. We currently use it for -Rpass. We don't do this by default, I think to save compile time.

Specifically emissionKind: NoDebug, example:

!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 16.0.0 (git@github.com:llvm/llvm-project.git 7b433e026498cf4176931b2407baece1d5060e16)", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, splitDebugInlining: false, nameTableKind: None)

Though should the frontend be setting codegen options when parsing? Would the idea be to try to re-set OPT_debug_info_kind_EQ when clang codegen's IR for a function with such an attribute?

Probably turn on emissionKind: NoDebug whenever the warning is enabled?

So this warning is default enabled, IIRC.

Then it might be a broader question about whether this extra info is acceptable to turn on by default, and if not, maybe an extra flag would be needed to say "give me extra info in this diagnostic", basically (or a separate warning flag that's off by default) - some perf metrics might help indicate whether the extra info is cheap enough.

I guess I need to check for -Wno-attribute-warning. If -Wno-attribute-warning is not set and -g (or friends) is not set, then I should set emissionKind to NoDebug (I think).

Generally you shouldn't be checking the raw flags like this for a feature like this - I'd expect something in Clang that checks to see if the warning is enabled, rather than whether certain flag spellings are used, etc. Though what goes in the driver V frontend, etc complicates things (if the decision is made in the frontend, once warning flags are parsed and there's a diagnostic engine you can query for warning enablement, then it'd need to override the debug info level rather than the command line handling in the driver)


It'd be nice not to invent a new way of tracking inlining separate from the way debug info does this - duplicate systems with separate opportunities for bugs, etc. Any chance we can reuse the debug info inlining descriptions for this?

So it looks like we have:

!28 = !DILocation(line: 14, column: 3, scope: !8, inlinedAt: !29)

Let me see if I can create DILocation without line or column values.

Not sure I follow - why would you want to drop line/column info? Isn't that relevant to the inlining stack - you might have multiple calls in the same function & it'd be good to know which one the diagnostic is referring to.

The DISubprogram and DILocation should form a similar chain, even if significantly more complicated to "unwind."

Not dropping it, more like without -g (or friends) it never existed in the first place. If you look at my change to llvm/lib/Transforms/Utils/InlineFunction.cpp (in this patch), I'm basically synthesizing metadata during inlining. If we don't have Debug Info related metadata because the program wasn't compiled with -g (or w/e), then the DILocation for the callsites was never produced by the frontend in the first place. This patch doesn't intentionally drop anything, it's more like the anything might never have existed in the first place.

So this warning could "be improved" if you recompiled with -g; I don't think there's really precedent for that and expect it's perhaps controversial. Hence my custom metadate nodes rather than the existing DILocation.

I'm still a bit confused/not following about the "Let me see if I can create DILocation without line or column values." - why do you want to create that?

clang has a "LocTrackingOnly" setting for debug info, which emits DILocation info into the IR, but emits a marker into the DICompileUnit to skip emitting the .debug_info in the backend. We currently use it for -Rpass. We don't do this by default, I think to save compile time.

Specifically emissionKind: NoDebug, example:

!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 16.0.0 (git@github.com:llvm/llvm-project.git 7b433e026498cf4176931b2407baece1d5060e16)", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, splitDebugInlining: false, nameTableKind: None)

Though should the frontend be setting codegen options when parsing? Would the idea be to try to re-set OPT_debug_info_kind_EQ when clang codegen's IR for a function with such an attribute?

Probably turn on emissionKind: NoDebug whenever the warning is enabled?

So this warning is default enabled, IIRC.

Then it might be a broader question about whether this extra info is acceptable to turn on by default, and if not, maybe an extra flag would be needed to say "give me extra info in this diagnostic", basically (or a separate warning flag that's off by default) - some perf metrics might help indicate whether the extra info is cheap enough.

The implementation as it stands is zero cost in the sense that if __attribute__((warning(""))) or __attribute__((error(""))) aren't used in the source code, there is ZERO cost in IR. Additional metadata is only created when inlining occurs (which might not happen if optimizations weren't enabled) where callsites to such functions exist (unlikely), and if the call site is optimized away, the metadata should be deleted (at least, I imagine that's how metadata works in LLVM. I should actually verify that's the case, then probably add such verification as a test to this patch, perhaps.

I'm still a bit confused/not following about the "Let me see if I can create DILocation without line or column values." - why do you want to create that?

You're the one that asked if if I can reuse existing metadata nodes, specifically:

It'd be nice not to invent a new way of tracking inlining separate from the way debug info does this

So I could emit DILocation and DISubprogram metadata rather than the custom metadata nodes as I'm doing in this diff. I just wouldn't be able to synthesize line or column info for the callsites, since the frontend may not have ever produced it in the first place. Such an approach would be emitting DILocation+DISubprogram from the backend, not the frontend; the backend doesn't have line+column info unless debugging was enabled in the first place (which it probably wasn't).

I don't want to pessimistically emit these always from the frontend with precise line+column info. That would be so pessimistic for the common case where you probably never encounter functions with such attributes in a given TU.

clang has a "LocTrackingOnly" setting for debug info, which emits DILocation info into the IR, but emits a marker into the DICompileUnit to skip emitting the .debug_info in the backend. We currently use it for -Rpass. We don't do this by default, I think to save compile time.

Specifically emissionKind: NoDebug, example:

!0 = distinct !DICompileUnit(language: DW_LANG_C11, file: !1, producer: "clang version 16.0.0 (git@github.com:llvm/llvm-project.git 7b433e026498cf4176931b2407baece1d5060e16)", isOptimized: true, runtimeVersion: 0, emissionKind: NoDebug, splitDebugInlining: false, nameTableKind: None)

Though should the frontend be setting codegen options when parsing? Would the idea be to try to re-set OPT_debug_info_kind_EQ when clang codegen's IR for a function with such an attribute?

Probably turn on emissionKind: NoDebug whenever the warning is enabled?

So this warning is default enabled, IIRC.

Then it might be a broader question about whether this extra info is acceptable to turn on by default, and if not, maybe an extra flag would be needed to say "give me extra info in this diagnostic", basically (or a separate warning flag that's off by default) - some perf metrics might help indicate whether the extra info is cheap enough.

The implementation as it stands is zero cost in the sense that if __attribute__((warning(""))) or __attribute__((error(""))) aren't used in the source code, there is ZERO cost in IR. Additional metadata is only created when inlining occurs (which might not happen if optimizations weren't enabled) where callsites to such functions exist (unlikely), and if the call site is optimized away, the metadata should be deleted (at least, I imagine that's how metadata works in LLVM. I should actually verify that's the case, then probably add such verification as a test to this patch, perhaps.

I'm still a bit confused/not following about the "Let me see if I can create DILocation without line or column values." - why do you want to create that?

You're the one that asked if if I can reuse existing metadata nodes, specifically:

It'd be nice not to invent a new way of tracking inlining separate from the way debug info does this

Right - I was thinking more, as above, about directly using the existing metadata generation (if it's too expensive to enable by default, then possibly under an off-by-default warning or other flag) that the inliner already knows how to read and write, rather than creating new/different metadata handling.

Perhaps there's already logic for enabling this for remarks (CompilerInvocation.cpp:~2019).

Though it's possibly a path forward to generate new DILocations in the absence of any - it's not the top of my list reach for here, but that's just my 2c. Perhaps other folks have other ideas.

It's possible that this approach might be able to fill a gap we sometimes have in debug info, though - where a call site lacks a DILocation and that makes creating the inline debug info difficult/impossible at the moment (there's a whole long history of this issue, the assertions and verifier checks we've implemented to detect it, and the bugs (frontend missing opportunities to emit useful DILocations) it has found/lead to being fixed, though - so I'm somewhat hesitant to make us more permissive on constructing debug info inlining for call sites without DILocations). (@aprantl @JDevlieghere perhaps you folks have thoughts on this?)

So I could emit DILocation and DISubprogram metadata rather than the custom metadata nodes as I'm doing in this diff. I just wouldn't be able to synthesize line or column info for the callsites, since the frontend may not have ever produced it in the first place. Such an approach would be emitting DILocation+DISubprogram from the backend, not the frontend; the backend doesn't have line+column info unless debugging was enabled in the first place (which it probably wasn't).

I don't want to pessimistically emit these always from the frontend with precise line+column info. That would be so pessimistic for the common case where you probably never encounter functions with such attributes in a given TU.

It's an awkward situation, to be sure.

Looks like GCC does include the line number in its backtrace? So maybe that's a justification for us to do so as well. Again, might be worth knowing what the cost of the debug info metadata loc tracking mode is.

Right - I was thinking more, as above, about directly using the existing metadata generation (if it's too expensive to enable by default, then possibly under an off-by-default warning or other flag) that the inliner already knows how to read and write, rather than creating new/different metadata handling.
Again, might be worth knowing what the cost of the debug info metadata loc tracking mode is.

For a first order approximation, I have this diff applied:

diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 436226c6f178..6d5049803188 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -386,7 +386,7 @@ VALUE_CODEGENOPT(SmallDataLimit, 32, 0)
 VALUE_CODEGENOPT(SSPBufferSize, 32, 0)
 
 /// The kind of generated debug info.
-ENUM_CODEGENOPT(DebugInfo, codegenoptions::DebugInfoKind, 4, codegenoptions::NoDebugInfo)
+ENUM_CODEGENOPT(DebugInfo, codegenoptions::DebugInfoKind, 4, codegenoptions::LocTrackingOnly)
 
 /// Whether to generate macro debug info.
 CODEGENOPT(MacroDebugInfo, 1, 0)

This changes the default value of codegenopts.DebugInfo from codegenoptions::NoDebugInfo to codegenoptions::LocTrackingOnly, which is what the optimization remark emitter infra uses. This emits more debug info than we need (metadata for every statement in IR rather than JUST call instructions). But it would give me analogous metadata I could use to solve this problem addressed by this patch, and it would guarantee that I had precise col/line info.

Comparing 30 Linux kernel x86_64 defconfig (i.e. no debug info) builds with vs without that change:

Without (baseline):

$ hyperfine --prepare 'make LLVM=1 -j128 -s clean' --runs 30 'make LLVM=1 -j128 -s'
Benchmark 1: make LLVM=1 -j128 -s
  Time (mean ± σ):     61.592 s ±  0.156 s    [User: 4378.360 s, System: 312.040 s]
  Range (min … max):   61.283 s … 62.026 s    30 runs

With diff from above:

$ hyperfine --prepare 'make LLVM=1 -j128 -s clean' --runs 30 'make LLVM=1 -j128 -s'  
Benchmark 1: make LLVM=1 -j128 -s
  Time (mean ± σ):     62.228 s ±  0.523 s    [User: 4433.828 s, System: 312.908 s]
  Range (min … max):   61.825 s … 64.912 s    30 runs

So that's a slowdown from the mean of ~1.02% ((1 - 61.592/62.228)*100). We probably could claw some of that back if we had another level of codegenoptions between NoDebugInfo and LocTrackingOnly that only emitted LocTracking info for call insts (stated another way, omit DILocation for all instructions other than call). I'm guessing that would take significant work to add to clang; I wasn't able how to figure out how to do so quickly. I imagine updating the non-debug-info clang tests to also be a treat. Is a 1% compile time performance hit worth more precise backend diagnostics? Probably (IMO). Perhaps worth an RFC?


FWIW, here are measurements of D141451 @ Diff 488371 at the same base as the baseline:

$ hyperfine --prepare 'make LLVM=1 -j128 -s clean' --runs 30 'make LLVM=1 -j128 -s'
Benchmark 1: make LLVM=1 -j128 -s
  Time (mean ± σ):     61.702 s ±  0.456 s    [User: 4395.106 s, System: 313.120 s]
  Range (min … max):   61.409 s … 64.000 s    30 runs

That's ~0.17% slower (and my machine is running hot from benchmarking for the past 1.5hrs). Note this patch/approach doesn't have precise line info.

Right - I was thinking more, as above, about directly using the existing metadata generation (if it's too expensive to enable by default, then possibly under an off-by-default warning or other flag) that the inliner already knows how to read and write, rather than creating new/different metadata handling.
Again, might be worth knowing what the cost of the debug info metadata loc tracking mode is.

For a first order approximation, I have this diff applied:

diff --git a/clang/include/clang/Basic/CodeGenOptions.def b/clang/include/clang/Basic/CodeGenOptions.def
index 436226c6f178..6d5049803188 100644
--- a/clang/include/clang/Basic/CodeGenOptions.def
+++ b/clang/include/clang/Basic/CodeGenOptions.def
@@ -386,7 +386,7 @@ VALUE_CODEGENOPT(SmallDataLimit, 32, 0)
 VALUE_CODEGENOPT(SSPBufferSize, 32, 0)
 
 /// The kind of generated debug info.
-ENUM_CODEGENOPT(DebugInfo, codegenoptions::DebugInfoKind, 4, codegenoptions::NoDebugInfo)
+ENUM_CODEGENOPT(DebugInfo, codegenoptions::DebugInfoKind, 4, codegenoptions::LocTrackingOnly)
 
 /// Whether to generate macro debug info.
 CODEGENOPT(MacroDebugInfo, 1, 0)

This changes the default value of codegenopts.DebugInfo from codegenoptions::NoDebugInfo to codegenoptions::LocTrackingOnly, which is what the optimization remark emitter infra uses. This emits more debug info than we need (metadata for every statement in IR rather than JUST call instructions). But it would give me analogous metadata I could use to solve this problem addressed by this patch, and it would guarantee that I had precise col/line info.

Comparing 30 Linux kernel x86_64 defconfig (i.e. no debug info) builds with vs without that change:

Without (baseline):

$ hyperfine --prepare 'make LLVM=1 -j128 -s clean' --runs 30 'make LLVM=1 -j128 -s'
Benchmark 1: make LLVM=1 -j128 -s
  Time (mean ± σ):     61.592 s ±  0.156 s    [User: 4378.360 s, System: 312.040 s]
  Range (min … max):   61.283 s … 62.026 s    30 runs

With diff from above:

$ hyperfine --prepare 'make LLVM=1 -j128 -s clean' --runs 30 'make LLVM=1 -j128 -s'  
Benchmark 1: make LLVM=1 -j128 -s
  Time (mean ± σ):     62.228 s ±  0.523 s    [User: 4433.828 s, System: 312.908 s]
  Range (min … max):   61.825 s … 64.912 s    30 runs

So that's a slowdown from the mean of ~1.02% ((1 - 61.592/62.228)*100). We probably could claw some of that back if we had another level of codegenoptions between NoDebugInfo and LocTrackingOnly that only emitted LocTracking info for call insts (stated another way, omit DILocation for all instructions other than call). I'm guessing that would take significant work to add to clang;

I don't think that should be too costly/is probably worthwhile if the cost is linear on the number of instructions we have to give locations & calls are probably a minority of instructions overall.

This level would be in CodeGenOptions, but wouldn't have to be rendered into the LLVM DICompileUnit level - since it'd still be the same "there's some debug info here but it's not for emitting DWARF" as far as LLVM was concerned.

I wasn't able how to figure out how to do so quickly. I imagine updating the non-debug-info clang tests to also be a treat.

Hopefully non-debug-info tests won't generally care about a bit of extra metadata, but no doubt some'll need updating.

Is a 1% compile time performance hit worth more precise backend diagnostics? Probably (IMO). Perhaps worth an RFC?

Probably

I guess my only other question (worth mentioning in the RFC, perhaps)/direction would be that any of these "things that are expensive but make backend diagnostics better" could be to put them under a different warning flag (if making -Wattribute-{warning|error} off by default isn't feasible - leaving them on-by-default but less than great user experience when optimizing code, and have the warning suggest enabling a different warning flag that's more verbose but off by default and also enables the extra location tracking - so codebases that generally do use these attributes can opt in to that, while codebases that generally don't do that don't entirely lose/ignore the attributes).

I guess my only other question (worth mentioning in the RFC, perhaps)/direction would be that any of these "things that are expensive but make backend diagnostics better" could be to put them under a different warning flag (if making -Wattribute-{warning|error} off by default isn't feasible - leaving them on-by-default but less than great user experience when optimizing code, and have the warning suggest enabling a different warning flag that's more verbose but off by default and also enables the extra location tracking - so codebases that generally do use these attributes can opt in to that, while codebases that generally don't do that don't entirely lose/ignore the attributes).

I don't think it's worth it to try to make these "zero cost." There are two function attributes here: __attribute__((error(""))) and __attribute__((warning(""))). They are basically implemented precisely the same way except how diagnostics are finally emitted by the frontend:

For __attribute__((error(""))), we always emit an error. There is no -Wno-error=XXX or -Wno-XXX flag to change that. (i.e. no such -Wattribute-error)

For __attribute__((warning(""))), we emit a warning by default, an error if -Werror=attribute-warning is set, and nothing if -Wno-attribute-warning is set (unlikely).

Since -Wno-attribute-warning is unlikely to ever be set, it's not worth making that case "zero cost."


Will work on that RFC though!

This is right on the line of what I think is reasonable in terms of diagnostics from the backend. It provides useful information to users about inlining decisions, so that's awesome when that information is helpful. But those inlining decisions can be arbitrarily long and complex, so this can emit a *lot* of note diagnostics on real world code which can make the diagnostic harder to reason about. In general, I think inlining decisions should be left to an optimization report, but this information could make interpreting the error or warning attributes easier.

Have you checked to see how the diagnostics work in practice on deep inlining stacks? If it's usually only 1-2 extra notes, that's probably not going to overwhelm anyone. But if it's significantly more, that could be distracting. Do we want any options to control when to emit the notes? e.g., do we want to give users a way to say "only emit the last N inlining decision notes"?

Also, how does this work with LTO when it makes different inlining decisions?

clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

Instead of allowing this note to appear anywhere in the file, I think it's better to use "bookmark" comments. e.g.,

void baz() { // #baz_defn
}

void foo() {
  baz(); // expected-note@#baz_defn {{whatever note text}}
}

so that we're testing where the diagnostic is emitted. (This is especially important given that the changes are about location tracking.)

Just comments from the CFE.

clang/include/clang/Basic/DiagnosticFrontendKinds.td
94

This tab in the diagnostic is odd, we never do this, we just count on the cascading notes to be clear. Also, this probably needs bikeshedding for diagnostic messages.

clang/lib/CodeGen/CodeGenAction.cpp
857

Could we instead just make demangle take a string_view here? It takes it by const-ref, which shows that it doesn't really seem to need it to be a string, so I would imagine this would be a minor refactor (to add such an overload).

clang/test/Frontend/backend-attribute-error-warning-optimize.c
29

Same comment re-bookmarks, but this diagnostic seems awkward.

Should we instead say :

call to 'foo' declared with 'error' attribute: oh no foo
called by function 'a'
inlined by function 'b'
inlined by function 'd'?

nickdesaulniers marked an inline comment as done.Apr 12 2023, 1:54 PM
nickdesaulniers added inline comments.
clang/lib/CodeGen/CodeGenAction.cpp
857

Do you mean StringRef rather than std::string_view? The progammer's manual provides guidance on the former (https://llvm.org/docs/ProgrammersManual.html#the-stringref-class) but not that latter.

Otherwise is this what you had in mind?

diff --git a/llvm/include/llvm/Demangle/Demangle.h b/llvm/include/llvm/Demangle/Demangle.h
index 6133d0b95bbf..c47222f883e9 100644
--- a/llvm/include/llvm/Demangle/Demangle.h
+++ b/llvm/include/llvm/Demangle/Demangle.h
@@ -11,6 +11,7 @@
 
 #include <cstddef>
 #include <string>
+#include <string_view>
 
 namespace llvm {
 /// This is a llvm local version of __cxa_demangle. Other than the name and
@@ -68,7 +69,7 @@ char *dlangDemangle(const char *MangledName);
 /// \param MangledName - reference to string to demangle.
 /// \returns - the demangled string, or a copy of the input string if no
 /// demangling occurred.
-std::string demangle(const std::string &MangledName);
+std::string demangle(const std::string_view MangledName);
 
 bool nonMicrosoftDemangle(const char *MangledName, std::string &Result);
 
diff --git a/llvm/lib/Demangle/Demangle.cpp b/llvm/lib/Demangle/Demangle.cpp
index 9d128424cabf..408112b9248e 100644
--- a/llvm/lib/Demangle/Demangle.cpp
+++ b/llvm/lib/Demangle/Demangle.cpp
@@ -26,9 +26,10 @@ static bool isDLangEncoding(const std::string &MangledName) {
          MangledName[1] == 'D';
 }
 
-std::string llvm::demangle(const std::string &MangledName) {
+std::string llvm::demangle(const std::string_view MangledName) {
   std::string Result;
-  const char *S = MangledName.c_str();
+  std::string Mangled(MangledName);
+  const char *S = Mangled.c_str();
 
   if (nonMicrosoftDemangle(S, Result))
     return Result;
@@ -43,7 +44,7 @@ std::string llvm::demangle(const std::string &MangledName) {
     return Result;
   }
 
-  return MangledName;
+  return Mangled;
 }
 
 bool llvm::nonMicrosoftDemangle(const char *MangledName, std::string &Result) {
clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

oh, that's new (to me)! will do

29

ah, yeah, I have this backwards (in v1, I had fixed this in the alt impl https://reviews.llvm.org/D145994). Let me fix that up here now, too. Good catch!

llvm/lib/Transforms/Utils/InlineFunction.cpp
2467–2468

marking this thread as done; see other thread below. If someone adds an alias, maybe they _intend_ to call the warning/error attributed function for [[good reason]].

Have you checked to see how the diagnostics work in practice on deep inlining stacks? If it's usually only 1-2 extra notes, that's probably not going to overwhelm anyone. But if it's significantly more, that could be distracting. Do we want any options to control when to emit the notes? e.g., do we want to give users a way to say "only emit the last N inlining decision notes"?

I don't have a good means of aggregating such data, but in my kernel experience we only ever emit about 2-4 notes before we stop inlining. It is usually much less than the inclusion chain of headers reported. Example (I've manually added a bad memcmp to kernel sources):

In file included from drivers/nfc/nfcmrvl/fw_dnld.c:8:
In file included from ./include/linux/module.h:13:
In file included from ./include/linux/stat.h:19:
In file included from ./include/linux/time.h:60:
In file included from ./include/linux/time32.h:13:
In file included from ./include/linux/timex.h:67:
In file included from ./arch/x86/include/asm/timex.h:5:
In file included from ./arch/x86/include/asm/processor.h:23:
In file included from ./arch/x86/include/asm/msr.h:11:
In file included from ./arch/x86/include/asm/cpumask.h:5:
In file included from ./include/linux/cpumask.h:12:
In file included from ./include/linux/bitmap.h:11:
In file included from ./include/linux/string.h:254:
./include/linux/fortify-string.h:661:4: error: call to '__read_overflow2' declared with 'error' attribute: detected read beyond size of object (2nd parameter)
                        __read_overflow2();
                        ^
note: In function 'fw_dnld_rx_work'
note:   which inlined function 'process_state_reset'
note:   which inlined function 'memcmp(void const*, void const* pass_dynamic_object_size0, unsigned long pass_dynamic_object_size0)'
1 error generated.

Can you construct come "cuckoo-bananas" cases with this though? Yes: https://godbolt.org/z/9eP8vjeb8

Also, how does this work with LTO when it makes different inlining decisions?

The concern with LTO is less about a change in inlining decisions, and more so about loss of fidelity with corresponding SourceLocations.

Testing the same kernel code same above, but with LTO enabled, we simply get a backend diagnostic from LLD, since the sources and thus SourceLocations are not retained:

ld.lld: error: call to __read_overflow2 marked "dontcall-error": detected read beyond size of object (2nd parameter)
nickdesaulniers marked 3 inline comments as done.
  • rebase, invert inlining reporting chain, update diagnostic text
clang/include/clang/Basic/DiagnosticFrontendKinds.td
94

I've removed the tab, and changed this to what you've suggested below. Still open for bikeshedding, but marking this thread as done.

clang/lib/CodeGen/CodeGenAction.cpp
857

That yak shave is going on in https://reviews.llvm.org/D148566; in particular, it will require changes to libcxxabi, since llvm/Demangle is just copied from libcxxabi as the upstream source. As such, marking this thread done.

I'm happy to help clean up such crufty APIs, but we should not block this feature on that.

clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

It looks like the "bookmarks" don't work because the notes do not line+col info. They follow the warning/error diagnostic which does, on the bottom most call site.

The warning is supposed to be emitted on the callsite, not the definition.

erichkeane accepted this revision.Apr 25 2023, 10:25 AM

Sorry, I didn't see your updates I guess! Thanks for the yak-shave btw, I realize that has turned into a heck of a mess :/

CFE LGTM. I don't believe I can approve the LLVM components however.

This revision is now accepted and ready to land.Apr 25 2023, 10:25 AM
aaron.ballman added inline comments.
clang/lib/CodeGen/CodeGenAction.cpp
860

Alternatively:

for (auto Dec : llvm::enumerate(InliningDecisions)) {
  std::string S = llvm::demangle(Dec.value());
  if (Dec.index() == 0)
    Diags.Report(...);
  ...
}
clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

I still don't think this is reasonable for test coverage because this means we'll accept the note *anywhere* in the file. This makes it much too easy to regress the behavior accidentally (e.g., accidentally emit all the notes on line 1 of the file). I think we need *some* way to nail down where these notes are emitted in the test. However, I might be asking you to solve "please track location information through the backend" for that, so perhaps this isn't a reasonable request?

Also, CC @cjdb for awareness of how this kind of diagnostic is produced (Chris is working on an idea for how we emit diagnostics so we get better structured information from them, so knowing about this novel approach might impact that idea).

cjdb added inline comments.Apr 26 2023, 10:03 AM
clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

Very interesting, thanks for the heads up! Cross-phase diagnostics are definitely something I hadn't considered, and it does impact the design (but not in a derailing way).

nickdesaulniers marked an inline comment as done.
  • rebase
  • use llvm::enumerate
clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

I think we're back at "please track location information through the backend".

That is the tradeoff with this approach; not measurably regressing performance when these attributes aren't used in source in exchange for Note diagnostics that lack precise line+col info, but in practice provide enough info for the user to understand what's going wrong where.

Your call if the tradeoff is worth it.

aaron.ballman added inline comments.Apr 27 2023, 5:23 AM
clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

Here's my thinking, please correct any misunderstandings I have:

  • This functionality is primarily for use with _FORTIFY_SOURCE (but not solely for that purpose), and that's an important security feature used by glibc and the Linux kernel.
  • Security of C and C++ source code is Very Important, especially now that various gov't agencies [1][2] are suggesting to not use C or C++ for new projects specifically because the security posture of the languages and their tools.
  • Therefore, the ergonomics of this functionality are quite important for the intended use cases and the ecosystem as a whole.
  • Emitting only the function name but without location information does not give a good user experience, especially in circumstances where the function has multiple overloads, because it pushes the burden onto the programmer to figure out which functions are actually involved in the call chain. This issue is also present in C because of support for __attribute__((overloadable)) and is not just a C++ concern.
  • The compile-time performance costs of tracking this location information are roughly 1%, and there are no runtime or binary size overhead costs unless other functionality is enabled (such as emitting debug info).
  • We can determine whether we need to enable this location tracking while building the AST when we see one of these two attributes being used, so we could enable this functionality selectively without burdening the user with enabling it manually via a flag. (We could still consider giving the user a flag to never track this even in the presence of the attributes if a user had a compelling use case for it.)

If this is all reasonably accurate, I think it's worth seriously considering accepting the costs. I don't want to increase our compile time overhead, but at the same time, if ~1% is a "make or break" amount of compile-time performance to lose, we should be addressing *that* issue rather than hamstringing a user-facing diagnostic feature related to security. It's hard to argue "that CVE was totally worth it because we could then compile 1% faster". That said, perhaps others have a strong contrary opinion they'd like to argue in favor of?

[1] https://media.defense.gov/2022/Nov/10/2003112742/-1/-1/0/CSI_SOFTWARE_MEMORY_SAFETY.PDF
[2] https://doi.org/10.6028/NIST.IR.8397

cjdb added inline comments.Apr 27 2023, 10:38 AM
clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

My attitude is that we should prioritise ergonomics and usability over speed. Yes, having to wait a little longer for stuff to build sucks, but there's a clear ergonomics issue, which is motivation enough for me in this case (adding the CVE argument is the cherry on top). Folks either aren't going to use a feature if it's difficult to get right, or they're going to use it with large amounts of stress (and I don't think this is fair).

Moreover, I think there may be some opportunities we can look into to cancel out this pessimisation.

clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

The compile-time performance costs of tracking this location information are roughly 1%, and there are no runtime or binary size overhead costs unless other functionality is enabled (such as emitting debug info).

(Just a note for passers-by: https://reviews.llvm.org/D141451 is not a 1% performance regression; @aaron.ballman is referring to https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers Solution 2 where would could track all location info unconditionally)

That 1% measurement was for Linux kernel builds, as reported https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers, and it gets worse for other projects. The ones tracked on llvm compile time tracker were much worse geomean slowdown of 4.77% of wall time to build various open source projects (ranging from 1.95% to 6.9%). http://llvm-compile-time-tracker.com/compare.php?from=c1125ae5b05f5e23c7e61e85afb359e095444d18&to=059eb5565a580f7ea37c2d62dcffdb7401e12e55&stat=wall-time

I did try tracking only source locations of calls.
https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/11?u=nickdesaulniers
That was faster, but I only measured Linux kernel build times, and not llvm compile time tracker times. I could do a similar experiment (test on llvm compile time tracker) if that helps?

We can determine whether we need to enable this location tracking while building the AST when we see one of these two attributes being used

I had looked into that; I wasn't able to get something like that working. IIRC, it's hard to modify codegen during parsing, or the decision on what kind of DICompileUnit to emit had already been made. Or maybe I was trying to do it from CodeGen after we already started producing IR, I don't remember now. Are we guaranteed to never start CodeGen of IR until after Sema is complete? I could look into this again/try harder though.

We could still consider giving the user a flag to never track this even in the presence of the attributes if a user had a compelling use case for it.

Yeah, I'm concerned with the interaction between <whatever ships> and -g0. If <whatever ships> == location tracking via debug info, then the presence of -g0 may change our diagnostics!

That said, perhaps others have a strong contrary opinion they'd like to argue in favor of?

@efriedma had some such feedback on the RFC: https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/2?u=nickdesaulniers

aaron.ballman added inline comments.May 1 2023, 12:46 PM
clang/test/Frontend/backend-attribute-error-warning-optimize.c
12

That 1% measurement was for Linux kernel builds, as reported https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261?u=nickdesaulniers, and it gets worse for other projects. The ones tracked on llvm compile time tracker were much worse geomean slowdown of 4.77% of wall time to build various open source projects (ranging from 1.95% to 6.9%). http://llvm-compile-time-tracker.com/compare.php?from=c1125ae5b05f5e23c7e61e85afb359e095444d18&to=059eb5565a580f7ea37c2d62dcffdb7401e12e55&stat=wall-time

Oh, that makes me sad. A 5% slowdown is quite the hit for this.

I did try tracking only source locations of calls.

https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/11?u=nickdesaulniers
That was faster, but I only measured Linux kernel build times, and not llvm compile time tracker times. I could do a similar experiment (test on llvm compile time tracker) if that helps?

I don't want to send you on a wild goose chase, but if we can find a solution that gets us closer to 1% (or less!), that would make it easier to decide what to do here. I think 5% is... probably too much, but then I wonder "how do we make Fortify work?".

I had looked into that; I wasn't able to get something like that working. IIRC, it's hard to modify codegen during parsing,

Ah, that I'd believe (I forgot, language options are basically const by the time we start parsing). However, could we squirrel this away on ASTContext instead?

That said, perhaps others have a strong contrary opinion they'd like to argue in favor of?

@efriedma had some such feedback on the RFC: https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261/2?u=nickdesaulniers

Ah, thank you for reminding me of that! However, I disagree with this:

(Most code doesn’t normally contain either inline asm or warning/error attributes, and I’m not really convinced by other potential diagnostics.)

because this attribute is also used by glibc: https://codebrowser.dev/glibc/glibc/misc/sys/cdefs.h.html#203 and that macro does get used: https://codebrowser.dev/glibc/glibc/stdlib/bits/stdlib.h.html#33 or https://codebrowser.dev/glibc/glibc/posix/bits/unistd.h.html#32 (etc), so I think this could potentially be significantly wider than just the Linux kernel. Does that change your opinion at all @efriedma?

probably too much, but then I wonder "how do we make Fortify work?".

(I'm still sort of on the side of "if the compile-time costs to get more verbose diagnostics is high, we could have a low-quality default-on diagnostic and it could mention/recommend a different/high quality/compile-time costly diagnostic" so for code bases like the Linux kernel that rely on this sort of thing a lot, with lots of always_inlining, etc, they can tradeoff toward the better diagnostic experience and codebases that don't use these features much they can still get the checking for when it does come up, but at some cost of quality/awkwardness in needing to rebuild with other flags)

But if folks decided some more invasive tracking was worth implementing (alongside the debug info codepath that already exists but is perhaps too slow to be always on) I wouldn't get up on a soapbox to strenuously object - I just think it's a bit unfortunate to build a duplicate thing, but maybe necessary.

MaskRay added inline comments.May 1 2023, 6:56 PM
clang/lib/CodeGen/CodeGenAction.cpp
860

Nit: use C++17 structured bindings

clang/test/Frontend/backend-attribute-error-warning-optimize.c
34

The new lines mix 2-space and 4-space indentations. 2 spaces should be more common.

llvm/test/Transforms/Inline/dontcall-attributes.ll
29

; CHECK-LABEL: define {{[^@]+}}@baz( (feel free to use another form, but keep define)

Otherwise @baz is not unique and may match a call void @baz(...).

31

Unneeded ^;$ line.

I have read prior discussions and https://discourse.llvm.org/t/rfc-improving-clangs-middle-and-back-end-diagnostics/69261 .

The current approach seems reasonable.

The summary should incorporate more information. inlined.from isn't mentioned at all.

clang/lib/CodeGen/CodeGenAction.cpp
861

Shall we use Diags.Report(LocCookie, ...?

llvm/test/Transforms/Inline/dontcall-attributes.ll
3

2-space indentation.

Actually I think the line is short enough, so no indentation is needed. You can use < to remove -o -

81

It's useful to have a Itanium mangled name.

probably too much, but then I wonder "how do we make Fortify work?".

(I'm still sort of on the side of "if the compile-time costs to get more verbose diagnostics is high, we could have a low-quality default-on diagnostic and it could mention/recommend a different/high quality/compile-time costly diagnostic" so for code bases like the Linux kernel that rely on this sort of thing a lot, with lots of always_inlining, etc, they can tradeoff toward the better diagnostic experience and codebases that don't use these features much they can still get the checking for when it does come up, but at some cost of quality/awkwardness in needing to rebuild with other flags)

This isn't about verbosity of the diagnostics, though, so much as it's about user experience. I'd agree that enabling more verbose diagnostic checking is reasonable to ask users to opt into, but what I'm struggling with is expecting users to opt into a mode so that the diagnostics appear in the correct places instead of detached from the source they're noting. Also, with the error and warning attributes both being used by glibc (and not just for fortify source, but as a general part of the interface), I think this is broader than just inlining decision notes.

But if folks decided some more invasive tracking was worth implementing (alongside the debug info codepath that already exists but is perhaps too slow to be always on) I wouldn't get up on a soapbox to strenuously object - I just think it's a bit unfortunate to build a duplicate thing, but maybe necessary.

Agreed on the unfortunateness of duplicating functionality; if we can avoid that, it'd be best. But I'm not sure we have more ideas on how to accomplish it, either.

aeubanks added inline comments.May 2 2023, 1:20 PM
llvm/include/llvm/IR/DiagnosticInfo.h
1122

can this return a SmallVector instead of taking one as a mutable param?

llvm/lib/IR/DiagnosticInfo.cpp
459

seems simpler if this is always a MDTuple instead of special casing one entry to be MDString

probably too much, but then I wonder "how do we make Fortify work?".

(I'm still sort of on the side of "if the compile-time costs to get more verbose diagnostics is high, we could have a low-quality default-on diagnostic and it could mention/recommend a different/high quality/compile-time costly diagnostic" so for code bases like the Linux kernel that rely on this sort of thing a lot, with lots of always_inlining, etc, they can tradeoff toward the better diagnostic experience and codebases that don't use these features much they can still get the checking for when it does come up, but at some cost of quality/awkwardness in needing to rebuild with other flags)

This isn't about verbosity of the diagnostics, though, so much as it's about user experience. I'd agree that enabling more verbose diagnostic checking is reasonable to ask users to opt into, but what I'm struggling with is expecting users to opt into a mode so that the diagnostics appear in the correct places instead of detached from the source they're noting. Also, with the error and warning attributes both being used by glibc (and not just for fortify source, but as a general part of the interface), I think this is broader than just inlining decision notes.

But if folks decided some more invasive tracking was worth implementing (alongside the debug info codepath that already exists but is perhaps too slow to be always on) I wouldn't get up on a soapbox to strenuously object - I just think it's a bit unfortunate to build a duplicate thing, but maybe necessary.

Agreed on the unfortunateness of duplicating functionality; if we can avoid that, it'd be best. But I'm not sure we have more ideas on how to accomplish it, either.

Fair enough. Well, hopefully somewhere in the summary of all this is a perf comparison - between the most narrowed down option using the existing DebugLocs and this new/distinct tracking system. But guess it's going that way.