This is an archive of the discontinued LLVM Phabricator instance.

[codegen] Store address of indirect arguments on the stack
ClosedPublic

Authored by fdeazeve on Jan 10 2023, 6:29 AM.

Details

Summary

With codegen prior to this patch, truly indirect arguments -- i.e.
those that are not byval -- can have their debug information lost even
at O0. Because indirect arguments are passed by pointer, and this
pointer is likely placed in a register as per the function call ABI,
debug information is lost as soon as the register gets clobbered.

This patch solves the issue by storing the address of the parameter on
the stack, using a similar strategy employed when C++ references are
passed. In other words, this patch changes codegen from:

define @foo(ptr %arg) {
   call void @llvm.dbg.declare(%arg, [...], metadata !DIExpression())

To:

define @foo(ptr %arg) {
   %ptr_storage = alloca ptr
   store ptr %arg, ptr %ptr_storage
   call void @llvm.dbg.declare(%ptr_storage, [...], metadata !DIExpression(DW_OP_deref))

Some common cases where this may happen with C or C++ function calls:

  1. "Big enough" trivial structures passed by value under the ARM ABI.
  2. Structures that are non-trivial for the purposes of call (as per the Itanium ABI) when passed by value.

Diff Detail

Event Timeline

fdeazeve created this revision.Jan 10 2023, 6:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 6:29 AM
fdeazeve requested review of this revision.Jan 10 2023, 6:29 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 10 2023, 6:29 AM
Herald added a subscriber: cfe-commits. · View Herald Transcript
fdeazeve updated this revision to Diff 487782.Jan 10 2023, 6:33 AM

Simplified condition spelling

fdeazeve updated this revision to Diff 487784.Jan 10 2023, 6:35 AM

Improved comments

clang/lib/CodeGen/CGDebugInfo.h
493

FWIW I used a const bool here to match the style already present in this class

clang/lib/CodeGen/CGDecl.cpp
2483

this is hoisted from below

This causes codegen to be different depending on whether debug-info is generated. Please don't do that.

I should expand on my previous comment. Extending the lifetime of these variables by saving the pointers into local variables, that's fine; just make sure it's not conditional on -g. The only difference between -g and not -g should be metadata and dbg.* instructions.

just make sure it's not conditional on -g.

This makes sense, I'll update the patch

Ah, so the intent is that this causes these indirect args to be handled the same way as other arguments at -O0 - placed in an alloca, eg:

struct t1 {
  t1(const t1&);
};
void f1(int, int);
void f2(t1 v, int x) {
  f1(3, 4);
}
0x00000033:     DW_TAG_formal_parameter
                  DW_AT_location        (indexed (0x0) loclist = 0x00000010: 
                     [0x0000000000000000, 0x0000000000000010): DW_OP_breg5 RDI+0
                     [0x0000000000000010, 0x0000000000000020): DW_OP_entry_value(DW_OP_reg5 RDI))
                  DW_AT_name    ("v")
...

0x0000003c:     DW_TAG_formal_parameter
                  DW_AT_location        (DW_OP_fbreg -4)
                  DW_AT_name    ("x")
...

Though I guess it doesn't change the codegen so that the alloca is used to load the value like with other values. Maybe it'd be worth changing the codegen to be more similar in that way? But maybe not. Don't know.

It seems plausible - though the impact on -O0 will be a question - if this hurts code size/perf too much (there's /some/ line there, but I don't know what it is), it couldn't go in -O0 and I'm not sure where it'd go. Logically -Og would make sense, but that's -O1 which runs mem2reg anyway, so would lose the alloca, which doesn't help...

Ah, so the intent is that this causes these indirect args to be handled the same way as other arguments at -O0 - placed in an alloca, eg:

Yup, that's correct!

Though I guess it doesn't change the codegen so that the alloca is used to load the value like with other values. Maybe it'd be worth changing the codegen to be more similar in that way?

Note that the situation here is slightly different: loading this new alloca would _not_ produce the value, it would produce the address of the value (because the argument is indirect). In that way, I'm not sure we would gain anything by loading from the alloca. Does that make sense?

though the impact on -O0 will be a question

Indeed. I'm not exactly sure what the right tradeoffs here are, but will think about this some more.

Basically you're homing a pointer to the parameter, rather than the parameter itself, which makes sense in this context.

I suspect the code size/perf impact on -O0 will not be huge, as this sounds like it's only for byval-but-not-really parameters. Normal pointer/reference parameters will be handled as they always have been. To get data about the code size impact, people typically build some large codebase with/without their patch and compare the .text sizes. It's common to use clang itself for this experiment.

I'm reminded that @wolfgangp developed a patch (which we use within Sony) to request lifetime extension as a codegen option. See his lightning talk (I thought it was posted as a patch, but I don't have a reference handy). I don't think this has any direct relationship to your patch, but it's something that our licensees have found useful.

fdeazeve updated this revision to Diff 488233.EditedJan 11 2023, 8:23 AM

Change codegen to no longer depend on -g.
A few extra tests had to be updated

fdeazeve added a comment.EditedJan 11 2023, 8:34 AM

To get data about the code size impact, people typically build some large codebase with/without their patch and compare the .text sizes. It's common to use clang itself for this experiment.

I ran a couple of experiments (*), running objdump --section-headers build/bin/clang | grep text:

Debug without patch: 178,707,948 bytes
Debug with patch: 178,785,896 bytes (+0.04%)
Release without patch: 85,734,800 bytes
Release with patch: 85,734,800 bytes (no diff)

(*) Baseline commit for the compiler == d74e36572a51, patch was applied on top of it.
The clang used in these experiments was built from commit 304838e828f9

To get data about the code size impact, people typically build some large codebase with/without their patch and compare the .text sizes. It's common to use clang itself for this experiment.

I ran a couple of experiments (*), running objdump --section-headers build/bin/clang | grep text:

Debug without patch: 178,707,948 bytes
Debug with patch: 178,785,896 bytes (+0.04%)

That does sound rather promising...

I'm guessing if we're going to add this generically for -O0 it's probably worth a thread on discourse for broader visibility.

rjmccall added a comment.EditedJan 11 2023, 2:13 PM

That's about what I would expect. One or two extra instructions per argument that are trivially removed in debug builds. Very small overall impact because there just aren't very many of these arguments.

I don't really see a need to post about this on Discourse, but it might be worth a release note.

That's about what I would expect. One or two extra instructions per argument that are trivially removed in debug builds. Very small overall impact because there just aren't very many of these arguments.

I don't really see a need to post about this on Discourse, but it might be worth a release note.

Fair enough - if you're cool with it. Yeah, release note wouldn't hurt.

clang/lib/CodeGen/CGDebugInfo.cpp
4826

We don't usually use top-level const on parameters/locals like this, for consistency (eg: with ArgNo) please remove it. Otherwise it can lead to confusion that maybe there was a & intended, etc.

clang/lib/CodeGen/CGDebugInfo.h
493

Examples of the things you were trying to be consistent with? Because the other by-value parameter here isn't const at the top level & const at the top level on function declaration parameters has no meaning, so especially here it should probably be omitted (& probably also in the definition, but it's a somewhat separate issue/has different tradeoffs)

fdeazeve added inline comments.Jan 11 2023, 4:59 PM
clang/lib/CodeGen/CGDebugInfo.h
493

The function on line 475 uses this same style:

llvm::DILocalVariable *
EmitDeclareOfAutoVariable(const VarDecl *Decl, llvm::Value *AI,
                          CGBuilderTy &Builder,
                          const bool UsePointerValue = false);

That said, I agree with you and personally would not have added const here, I'll remove it and then later we can maybe clean up other functions in this file

fdeazeve updated this revision to Diff 488611.Jan 12 2023, 5:14 AM

Address review comments:

  • Remove unncesarry const qualifier from function declaration.
  • Add an item in Clang's release notes.

I surveyed Clang's release notes in all major releases from 12.0 and couldn't
find anything similar to this, so I used my best judgement on where/how to
write it. Please let me know if you have suggestions.

dblaikie accepted this revision.Jan 12 2023, 9:37 AM

Sounds good

This revision is now accepted and ready to land.Jan 12 2023, 9:37 AM
aprantl accepted this revision.Jan 12 2023, 11:13 AM
This revision was landed with ongoing or failed builds.Jan 16 2023, 6:15 AM
This revision was automatically updated to reflect the committed changes.

Windows failures happening in Flang's MLIR stage, likely unrelated to this

There is a real regression in lldb/test/API/functionalities/param_entry_vals/basic_entry_values/ when this is compiled with O2:

__attribute__((noinline)) void func15(StructPassedViaPointerToTemporaryCopy S)

It seems that, prior to this patch, S had debug information in O2 builds. I'll investigate.

fdeazeve added a comment.EditedJan 16 2023, 8:49 AM

In hindsight, this should have been obvious.
While SROA will not touch this:

define @foo(ptr %arg) {
   call void @llvm.dbg.declare(%arg, [...], metadata !DIExpression())

It completely destroys the debug information provided by:

define @foo(ptr %arg) {
   %ptr_storage = alloca ptr
   store ptr %arg, ptr %ptr_storage
   call void @llvm.dbg.declare(%ptr_storage, [...], metadata !DIExpression(DW_OP_deref))

In other words, SROA rewrites the above to:

define @foo(ptr %arg) {
   call void @llvm.dbg.declare(undef, [...], metadata !DIExpression(DW_OP_deref))
jmorse added a subscriber: jmorse.Jan 17 2023, 3:35 AM

Hi,

While SROA will not touch this:

define @foo(ptr %arg) {
   call void @llvm.dbg.declare(%arg, [...], metadata !DIExpression())

It completely destroys the debug information provided by:

define @foo(ptr %arg) {
   %ptr_storage = alloca ptr
   store ptr %arg, ptr %ptr_storage
   call void @llvm.dbg.declare(%ptr_storage, [...], metadata !DIExpression(DW_OP_deref))

Curious -- that feels like the kind of thing we should be able to support, but adding more "special handling" to SROA isn't great.

One alternative line of investigation could be to keep the codegen changes you're adding, but leave the dbg.declare unmodified, i.e. referring to the argument value. The dbg.declare will become an indirect DBG_VALUE after isel, and if nothing optimises away the stack store, then LiveDebugValues (both flavours) might be able to track the store to stack and leave the variable homed there. I tried fiddling with llvm/test/DebugInfo/X86/spill-indirect-nrvo.ll to make that happen, however LiveDebugValues didn't follow the store, probably because it's to an alloca not a spill slot, thus there might be aliasing stores.

Curious -- that feels like the kind of thing we should be able to support, but adding more "special handling" to SROA isn't great.

Agreed! I'll have a look to see how far SROA is from being able to support this. Maybe it's almost there :)

and if nothing optimises away the stack store

This is the part that concerns me a bit, since SROA removes the alloca+store because they have no uses other than the dbg.declare.

In hindsight, this should have been obvious.
While SROA will not touch this:

define @foo(ptr %arg) {
   call void @llvm.dbg.declare(%arg, [...], metadata !DIExpression())

It completely destroys the debug information provided by:

define @foo(ptr %arg) {
   %ptr_storage = alloca ptr
   store ptr %arg, ptr %ptr_storage
   call void @llvm.dbg.declare(%ptr_storage, [...], metadata !DIExpression(DW_OP_deref))

In other words, SROA rewrites the above to:

define @foo(ptr %arg) {
   call void @llvm.dbg.declare(undef, [...], metadata !DIExpression(DW_OP_deref))

Seems reasonable to me that SROA should be able to do a better/the right job here, for this and other places where the equivalent operation might occur... but this is hardly my wheelhouse/don't take that perspective as gospel.

fdeazeve added a comment.EditedJan 17 2023, 11:54 AM

The code in Mem2Reg (Local.cpp) is wrong in the presence of DIExprs, but by luck we conservatively don't do anything.

The function below is roughly doing: "if this store covers the entire size of the DIVar, rewrite the dbg declare into a dbg value."

This is only correct if the DIExpr is empty. In most cases, "luckily" the size of a pointer doesn't match the size of the DIVar, but it should be possible to come up with IR examples where we generate invalid debug information in the presence of a OP_deref.

/// Inserts a llvm.dbg.value intrinsic before a store to an alloca'd value
/// that has an associated llvm.dbg.declare or llvm.dbg.addr intrinsic.
void llvm::ConvertDebugDeclareToDebugValue(DbgVariableIntrinsic *DII,
                                           StoreInst *SI, DIBuilder &Builder) {
  assert(DII->isAddressOfVariable() || isa<DbgAssignIntrinsic>(DII));
  auto *DIVar = DII->getVariable();
  assert(DIVar && "Missing variable");
  auto *DIExpr = DII->getExpression();
  DIExpr->startsWithDeref();
  Value *DV = SI->getValueOperand();

  DebugLoc NewLoc = getDebugValueLoc(DII);

  if (!valueCoversEntireFragment(DV->getType(), DII)) {
    // FIXME: If storing to a part of the variable described by the dbg.declare,
    // then we want to insert a dbg.value for the corresponding fragment.
    LLVM_DEBUG(dbgs() << "Failed to convert dbg.declare to dbg.value: "
                      << *DII << '\n');
    // For now, when there is a store to parts of the variable (but we do not
    // know which part) we insert an dbg.value intrinsic to indicate that we
    // know nothing about the variable's content.
    DV = UndefValue::get(DV->getType());
    Builder.insertDbgValueIntrinsic(DV, DIVar, DIExpr, NewLoc, SI);
    return;
  }

  Builder.insertDbgValueIntrinsic(DV, DIVar, DIExpr, NewLoc, SI);
}

https://reviews.llvm.org/D142160

Should address the incorrect handling of Mem2Reg, this is not the full picture though, as at the MIR level we handle these two differently:

DEBUG_VALUE .... !DIExpression ()  indirect    // this what dbg.declare gets lowered to
DEBUG_VALUE .... !DIExpression (OP_deref)      // this is what dbg.value + deref gets lowered to.