This is an archive of the discontinued LLVM Phabricator instance.

Frontend: Fix mcount inlining bug
ClosedPublic

Authored by honggyu.kim on Jul 22 2016, 1:39 AM.

Details

Summary

Since some profiling tools, such as gprof, ftrace, and uftrace, use
-pg option to generate a mcount function call at the entry of each
function. Function invocation can be detected by this hook function.

But mcount insertion is done before function inlining phase in clang,
sometime a function that already has a mcount call can be inlined in the
middle of another function.

This patch adds an attribute "counting-function" to each function
rather than emitting the mcount call directly in frontend so that this
attribute can be processed in backend. Then the mcount calls can be
properly inserted in backend after all the other optimizations are
completed.

Link: https://llvm.org/bugs/show_bug.cgi?id=28660
Signed-off-by: Honggyu Kim <hong.gyu.kim@lge.com>

Diff Detail

Event Timeline

honggyu.kim retitled this revision from to Frontend: Fix mcount inlining bug.
honggyu.kim updated this object.
honggyu.kim added reviewers: rjmccall, hans.
honggyu.kim added a subscriber: cfe-commits.

Hi John and Hans,

I don't know if I found right reviewers for this patch, but could you please have a look at it?
I found this problem while I was trying to profile a binary that is compiled with -pg and -O2 as I reported in bugzilla.
https://llvm.org/bugs/show_bug.cgi?id=28660

I know the better approach is to do function inlining first, then insert mcount calls for each function remained. But this can fix the problem in a simpler way as of now. If anyone can give me some guide or someone can move mcount insertion phase later, that would be also good.

Thanks,
Honggyu

What's the actual desired behavior here? Should the inliner strip out calls to mcount() as it inlines?

rjmccall edited edge metadata.Jul 22 2016, 11:18 AM

What's the actual desired behavior here? Should the inliner strip out calls to mcount() as it inlines?

I think they basically just want a late pass (as in immediately prior to codegen) to insert the mcount calls.

What's the actual desired behavior here? Should the inliner strip out calls to mcount() as it inlines?

I think they basically just want a late pass (as in immediately prior to codegen) to insert the mcount calls.

You're right. I wanted to move mcount insertion phase myself but it seems not that simple as I expected because mcount call is inserted in bitcode anyway, then can be optimized using opt tool to do function inlining later on. In this case, function inliner must strip mcount calls at the entry of each function.

Otherwise, we shouldn't generate mcount calls inside bitcode so that opt can simply do function inlining as it does, then code generator, llc, has to insert mcount just before code generation is done. I think this is not that simple as well.

To sum up, we have two approaches.

  1. inliner must strip mcount calls when it does function inlining.
  2. llc has to insert mcount calls just before code generation.

We have to choose one of these approaches. But I think this patch fixes the problem in a simpler way as of now.

What's the actual desired behavior here? Should the inliner strip out calls to mcount() as it inlines?

I think they basically just want a late pass (as in immediately prior to codegen) to insert the mcount calls.

You're right. I wanted to move mcount insertion phase myself but it seems not that simple as I expected because mcount call is inserted in bitcode anyway, then can be optimized using opt tool to do function inlining later on. In this case, function inliner must strip mcount calls at the entry of each function.

Otherwise, we shouldn't generate mcount calls inside bitcode so that opt can simply do function inlining as it does, then code generator, llc, has to insert mcount just before code generation is done. I think this is not that simple as well.

To sum up, we have two approaches.

  1. inliner must strip mcount calls when it does function inlining.
  2. llc has to insert mcount calls just before code generation.

We have to choose one of these approaches. But I think this patch fixes the problem in a simpler way as of now.

But it will also severely hurt performance, and that's not good either.

Adding an early IR-level pass to llc to insert calls to mcount seems straightforward. We could, for example, add some function attribute in the frontend which causes the pass to actually insert the mcount calls. There is some frontend knowledge currently regarding exactly what the mcount function is named (lib/Basic/Targets.cpp sets the MCountName variable) that we'd either need to pass in the attribute or move to the backend.

Another option is to add a new function attribute, drop_when_inlining (modulo bikeshedding), add that to the mcount function in the frontend, and teach the inliner to drop functions with that attribute when inlining.

Note that the presence of the mcount call itself will significantly impact inlining and, really, the entire optimization pipeline. Having this be a late pass seems more in keeping with what I assume is a goal that this instrumentation doesn't drastically change the generated code.

Note that the presence of the mcount call itself will significantly impact inlining and, really, the entire optimization pipeline. Having this be a late pass seems more in keeping with what I assume is a goal that this instrumentation doesn't drastically change the generated code.

I agree that the best solution is to move mcount insertion phase in the late pass so llc has to insert mcount calls for each function. But I don't have clear idea how much work has to be done for this. I confess that I don't have much experience on current LLVM code base. That's why I wrote this workaround patch to fix the bug first, then hopefully someone can move mcount insertion phase as you mentioned, or someone give me some guide how to do that. Writing a new attribute "drop_when_inlining" seems to be a bit more comfortable for me as Hal suggested.
Thanks for the comments.

honggyu.kim removed a subscriber: hfinkel.

One more problem is that -finstrument-functions has the same inlining problem. The code in llvm/tools/clang/lib/CodeGen/CodeGenFunction.cpp shows that.

void CodeGenFunction::StartFunction(GlobalDecl GD,
                                    QualType RetTy,
                                    llvm::Function *Fn,
                                    const CGFunctionInfo &FnInfo,
                                    const FunctionArgList &Args,
                                    SourceLocation Loc,
                                    SourceLocation StartLoc) {
    ...
  if (ShouldInstrumentFunction())
    EmitFunctionInstrumentation("__cyg_profile_func_enter");

  if (CGM.getCodeGenOpts().InstrumentForProfiling)
    EmitMCountInstrumentation();
    ...
}
hfinkel edited edge metadata.Jul 26 2016, 9:22 AM

Note that the presence of the mcount call itself will significantly impact inlining and, really, the entire optimization pipeline. Having this be a late pass seems more in keeping with what I assume is a goal that this instrumentation doesn't drastically change the generated code.

I agree that the best solution is to move mcount insertion phase in the late pass so llc has to insert mcount calls for each function. But I don't have clear idea how much work has to be done for this. I confess that I don't have much experience on current LLVM code base. That's why I wrote this workaround patch to fix the bug first, then hopefully someone can move mcount insertion phase as you mentioned, or someone give me some guide how to do that. Writing a new attribute "drop_when_inlining" seems to be a bit more comfortable for me as Hal suggested.
Thanks for the comments.

I agree with John; the right solution here is to insert the functions in the backend. As I understand it, the backend part is pretty simple. I'll write that part shortly so we can experiment.

Note that the presence of the mcount call itself will significantly impact inlining and, really, the entire optimization pipeline. Having this be a late pass seems more in keeping with what I assume is a goal that this instrumentation doesn't drastically change the generated code.

I agree that the best solution is to move mcount insertion phase in the late pass so llc has to insert mcount calls for each function. But I don't have clear idea how much work has to be done for this. I confess that I don't have much experience on current LLVM code base. That's why I wrote this workaround patch to fix the bug first, then hopefully someone can move mcount insertion phase as you mentioned, or someone give me some guide how to do that. Writing a new attribute "drop_when_inlining" seems to be a bit more comfortable for me as Hal suggested.
Thanks for the comments.

I agree with John; the right solution here is to insert the functions in the backend. As I understand it, the backend part is pretty simple. I'll write that part shortly so we can experiment.

Here's a patch for the backend part: D22825 - If this makes sense to everyone, we can change the frontend to add the corresponding attribute instead of inserting the function calls directly.

Here's a patch for the backend part: D22825 - If this makes sense to everyone, we can change the frontend to add the corresponding attribute instead of inserting the function calls directly.

I appreciate your help. I think I can write the frontend part that inserts 'counting-function' attribute when -pg is enabled. Please let me do this work!

Here's a patch for the backend part: D22825 - If this makes sense to everyone, we can change the frontend to add the corresponding attribute instead of inserting the function calls directly.

I appreciate your help.

No problem.

I think I can write the frontend part that inserts 'counting-function' attribute when -pg is enabled. Please let me do this work!

Great, please do :-)

honggyu.kim edited edge metadata.

I've updated the diff that can be working with D22825. I appreciate your work in backend, Hal.
Please correct me if I'm wrong.

I just wrote how I tested this in the bugzilla report page below:
https://llvm.org/bugs/show_bug.cgi?id=28660

hfinkel accepted this revision.Jul 28 2016, 12:12 PM
hfinkel edited edge metadata.

Comments about the comment, but otherwise, LGTM.

lib/CodeGen/CodeGenFunction.cpp
789

emitting mcount call -> emitting the mcount call

optimization phase -> optimizations

790

mark -> add

791

Attribute -> The attribute

792

remove ", and is processed by CountingFunctionInserter". That's an implementation detail of the backend, which might change, and is irrelevant to the frontend.

This revision is now accepted and ready to land.Jul 28 2016, 12:12 PM
honggyu.kim updated this object.
honggyu.kim edited edge metadata.
honggyu.kim marked 4 inline comments as done.

Comments about the comment, but otherwise, LGTM.

Just updated the comment as you mentioned. Thanks for correcting my English :)

Should we also modify clang/test/CodeGen/mcount.c as well? I'm not actually familiar with test infra.

$ cat llvm/tools/clang/test/CodeGen/mcount.c
// RUN: %clang_cc1 -pg -triple i386-unknown-unknown -emit-llvm -o - %s | FileCheck %s
// RUN: %clang_cc1 -pg -triple powerpc-unknown-gnu-linux -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple powerpc64-unknown-gnu-linux -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple powerpc64le-unknown-gnu-linux -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple i386-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple x86_64-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple arm-netbsd-eabi -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple aarch64-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple mips-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple powerpc-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple powerpc64-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple powerpc64le-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple sparc-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
// RUN: %clang_cc1 -pg -triple sparc64-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
void foo(void) {
// CHECK: call void @mcount()
// CHECK-PREFIXED: call void @_mcount()
}

Should we also modify clang/test/CodeGen/mcount.c as well? I'm not actually familiar with test infra.

Yes, you'll need to modify it to test for the attribute instead.

Should we also modify clang/test/CodeGen/mcount.c as well? I'm not actually familiar with test infra.

Yes, you'll need to modify it to test for the attribute instead.

Thanks. As I mentioned, I don't know about the test infra well so can you please give me some guide to check the attribute? I will learn this time and do it myself next time.

Should we also modify clang/test/CodeGen/mcount.c as well? I'm not actually familiar with test infra.

Yes, you'll need to modify it to test for the attribute instead.

Thanks. As I mentioned, I don't know about the test infra well so can you please give me some guide to check the attribute? I will learn this time and do it myself next time.

In this case, I think that making a simple test (changing the current test to be) like test/CodeGen/stackrealign.c would be fine. If you have any questions, please feel free to ask.

In this case, I think that making a simple test (changing the current test to be) like test/CodeGen/stackrealign.c would be fine. If you have any questions, please feel free to ask.

Thanks. Please correct me if I'm wrong.

diff --git a/test/CodeGen/mcount.c b/test/CodeGen/mcount.c
index 99dc907..31b5288 100644
--- a/test/CodeGen/mcount.c
+++ b/test/CodeGen/mcount.c
@@ -13,6 +13,7 @@
 // RUN: %clang_cc1 -pg -triple sparc-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
 // RUN: %clang_cc1 -pg -triple sparc64-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
 void foo(void) {
-// CHECK: call void @mcount()
-// CHECK-PREFIXED: call void @_mcount()
 }
+
+// CHECK: attributes #{{[0-9]+}} = { "counting-function"="mcount" {{.*}} }
+// CHECK-PREFIXED: attributes #{{[0-9]+}} = { "counting-function"="_mcount" {{.*}} }

In this case, I think that making a simple test (changing the current test to be) like test/CodeGen/stackrealign.c would be fine. If you have any questions, please feel free to ask.

Thanks. Please correct me if I'm wrong.

diff --git a/test/CodeGen/mcount.c b/test/CodeGen/mcount.c
index 99dc907..31b5288 100644
--- a/test/CodeGen/mcount.c
+++ b/test/CodeGen/mcount.c
@@ -13,6 +13,7 @@
 // RUN: %clang_cc1 -pg -triple sparc-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
 // RUN: %clang_cc1 -pg -triple sparc64-netbsd -emit-llvm -o - %s | FileCheck -check-prefix=CHECK-PREFIXED %s
 void foo(void) {
-// CHECK: call void @mcount()
-// CHECK-PREFIXED: call void @_mcount()
 }
+
+// CHECK: attributes #{{[0-9]+}} = { "counting-function"="mcount" {{.*}} }

Both of the RUN lines have --check=CHECK-PREFIXED; does anything actually use the above CHECK: line? Maybe you just want to have one CHECK: line, associated with the comment with -pg, and one CHECK-NOT: line, associated with a command line without -pg.

+// CHECK-PREFIXED: attributes #{{[0-9]+}} = { "counting-function"="_mcount" {{.*}} }

honggyu.kim updated this object.

I've updated the testcase again it adds 2 more tests for -O2 and without -pg.
Please have a look at it.

$ llvm-lit mcount.c
-- Testing: 1 tests, 1 threads --
PASS: Clang :: CodeGen/mcount.c (1 of 1)
Testing Time: 32.61s
  Expected Passes    : 1

Hi Renato and Saleem,

I found one more testcase that has to be changed. It's in llvm/tools/clang/test/Frontend/gnu-mcount.c.
But strangely, some cases provide mcount name with prefix '\01' such as below:

class ARMTargetInfo : public TargetInfo {
  ...
public:
  ARMTargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts,
                bool IsBigEndian)
      : TargetInfo(Triple), FPMath(FP_Default), IsAAPCS(true), LDREX(0),
        HW_FP(0) {
    ...
    if (Triple.getOS() == llvm::Triple::Linux ||
        Triple.getOS() == llvm::Triple::UnknownOS)
      this->MCountName =
          Opts.EABIVersion == "gnu" ? "\01__gnu_mcount_nc" : "\01mcount";
  ...
  }
  ...
};

class AArch64TargetInfo : public TargetInfo {
  ...
  AArch64TargetInfo(const llvm::Triple &Triple, const TargetOptions &Opts)
      : TargetInfo(Triple), ABI("aapcs") {
    ...
    if (Triple.getOS() == llvm::Triple::Linux ||
        Triple.getOS() == llvm::Triple::UnknownOS)
      this->MCountName = Opts.EABIVersion == "gnu" ? "\01_mcount" : "mcount";
    ...
  }
  ...
};

The modified testcase does not pass because of '\01' prefix. The following example is reduced version of the testcase.

$ cat test-mcount.c
// RUN: %clang -target armv7-unknown-none-eabi -pg -meabi gnu -S -emit-llvm -o - %s | FileCheck %s -check-prefix CHECK -check-prefix CHECK-ARM-EABI-MEABI-GNU

int f() {
  return 0;
}

// CHECK-LABEL: f
// CHECK-ARM-EABI-MEABI-GNU: attributes #{{[0-9]+}} = { {{.*}}"counting-function"="\01__gnu_mcount_nc"{{.*}} }

This is not passed only because of '\01' prefix. It seems that '\01' prefix is analyzed in a strange way when string match is performed by llvm-lit.
If I remove \01 prefix in MCountName in llvm/tools/clang/test/Frontend/gnu-mcount.c file and recompile it, then test it again, the testcase passes.
Can you please check why '\01' prefix is needed?

Related links:
https://llvm.org/bugs/show_bug.cgi?id=23969
https://llvm.org/bugs/show_bug.cgi?id=27311

The \01 is to prevent the mangler from touching the function name. If you noticed the check tags in the quoted test, some targets expect it to be decorated and others do not. What exactly do you mean that lit has a strings matching with \01? Its not a string \01, its the octal character 1.

The \01 is to prevent the mangler from touching the function name. If you noticed the check tags in the quoted test, some targets expect it to be decorated and others do not. What exactly do you mean that lit has a strings matching with \01? Its not a string \01, its the octal character 1.

Thanks for the explanation. I mean the following statement is not properly matched by llvm-lit even if frontend generate mcount name as "\01__gnu_mcount_nc".

// CHECK-ARM-EABI-MEABI-GNU: attributes #{{[0-9]+}} = { {{.*}}"counting-function"="\01__gnu_mcount_nc"{{.*}} }

If I remove '\01' prefix in frontend then generate just "__gnu_mcount_nc", the following statement that also doesn't have '\01' prefix is matched without problem.

// CHECK-ARM-EABI-MEABI-GNU: attributes #{{[0-9]+}} = { {{.*}}"counting-function"="__gnu_mcount_nc"{{.*}} }

Do you have any idea why this problem happens?

I see the difference now.
Below is the originally generated IR without this patch. It shows \01 clearly in call void @"\01__gnu_mcount_nc"().

$ clang -target armv7-unknown-none-eabi -pg -meabi gnu -S -emit-llvm -o - test-mcount.c
; ModuleID = 'mcount.c'
source_filename = "mcount.c"
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "armv7-unknown-none-eabi"

; Function Attrs: nounwind
define i32 @f() #0 {
  call void @"\01__gnu_mcount_nc"() #1
  ret i32 0
}

declare void @"\01__gnu_mcount_nc"()

attributes #0 = { nounwind "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a8" "target-features"="+dsp,+neon,+strict-align,+vfp3" "unsafe-fp-math"="false" "use-soft-float"="false" }
attributes #1 = { nounwind }
  ...

Here is the IR result with this patch. But it doesn't show \01 in front of __gnu_mcount_nc in attribute list. It shows only "counting-function"="__gnu_mcount_nc" here but if I dump the output into a file then open it, it shows as "counting-function"="^A__gnu_mcount_nc". Maybe there's some mishandling of \01 prefix whiling passing string that I didn't find yet.

$ clang -target armv7-unknown-none-eabi -pg -meabi gnu -S -emit-llvm -o - test-mcount.c
; ModuleID = 'mcount.c'
source_filename = "mcount.c"
target datalayout = "e-m:e-p:32:32-i64:64-v128:64:128-a:0:32-n32-S64"
target triple = "armv7-unknown-none-eabi"

; Function Attrs: nounwind
define i32 @f() #0 {
  ret i32 0
}

attributes #0 = { nounwind "counting-function"="__gnu_mcount_nc" "disable-tail-calls"="false" "less-precise-fpmad"="false" "no-frame-pointer-elim"="true" "no-frame-pointer-elim-non-leaf" "no-infs-fp-math"="false" "no-jump-tables"="false" "no-nans-fp-math"="false" "no-signed-zeros-fp-math"="false" "stack-protector-buffer-size"="8" "target-cpu"="cortex-a8" "target-features"="+dsp,+neon,+strict-align,+vfp3" "unsafe-fp-math"="false" "use-soft-float"="false" }
  ...

The \01 is to prevent the mangler from touching the function name. If you noticed the check tags in the quoted test, some targets expect it to be decorated and others do not. What exactly do you mean that lit has a strings matching with \01? Its not a string \01, its the octal character 1.

Hi Saleem,

It seems that name mangling is done before mcount call insertion so I think we don't have to put \01 prefix for mcount names. I have tested on x86-64 and ARM. Are there any other targets that work in a different order?
I've checked this clang::CodeGen::CodeGenModule::getMangledName() is always executed before clang::CodeGen::CodeGenFunction::StartFunction(), which actually inserts mcount calls.
Please correct me if I'm wrong

Theres a frontend mangler and a backend mangler. You should try this on Windows GNU environments :-).

Theres a frontend mangler and a backend mangler. You should try this on Windows GNU environments :-).

Thanks for the confirmation :)
If so, would it be okay if I put one more backslash for each mcount name with '\0' prefix? It shows the output properly with '\0' in IR generation. For example,

@@ -4920,7 +4920,7 @@ public:
     if (Triple.getOS() == llvm::Triple::Linux ||
         Triple.getOS() == llvm::Triple::UnknownOS)
       this->MCountName =
-          Opts.EABIVersion == "gnu" ? "\01__gnu_mcount_nc" : "\01mcount";
+          Opts.EABIVersion == "gnu" ? "\\01__gnu_mcount_nc" : "\\01mcount";
   }

   StringRef getABI() const override { return ABI; }

test-mcount.c test above can be passed with this modification.

No, the inserted character is a literal SOH, not the string "\01".

No, the inserted character is a literal SOH, not the string "\01".

I don't know why the string is passed in a different way but '\01' is shown in IR previously as below

$ clang -target armv7-unknown-none-eabi -pg -meabi gnu -S -emit-llvm -o - test-mcount.c
  ...
define i32 @f() #0 {
  call void @"\01__gnu_mcount_nc"() #1
  ret i32 0
}
  ...

But it doesn't show '\01' in IR with this modifiction.

$ clang -target armv7-unknown-none-eabi -pg -meabi gnu -S -emit-llvm -o - test-mcount.c
  ...
attributes #0 = { nounwind "counting-function"="__gnu_mcount_nc" ... }
  ...

The expected result is as follows. '\01' is clearly shown in IR.

$ clang -target armv7-unknown-none-eabi -pg -meabi gnu -S -emit-llvm -o - test-mcount.c
  ...
attributes #0 = { nounwind "counting-function"="\01__gnu_mcount_nc" ... }
  ...

Since I couldn't find why string passing is handled in a different way with previous version, I just put double backslash in "\\01__gnu_mcount_nc" because it generates the expected IR output. I know that we should have clear reason for this.
Do you have any idea why this problem happens?

honggyu.kim added a reviewer: compnerd.
honggyu.kim removed a subscriber: compnerd.

I have fixed SOH character problem by D23792. I've verified that mcount.c and gnu-mcount.c tests are passed now.

$ llvm-lit tools/clang/test/CodeGen/mcount.c
-- Testing: 1 tests, 1 threads --
PASS: Clang :: CodeGen/mcount.c (1 of 1)
Testing Time: 0.81s
  Expected Passes    : 1

$ llvm-lit tools/clang/test/Frontend/gnu-mcount.c
-- Testing: 1 tests, 1 threads --
PASS: Clang :: Frontend/gnu-mcount.c (1 of 1)
Testing Time: 10.36s
  Expected Passes    : 1

I think it's almost ready to be commited now. So please have a look at it with D23792.

This revision was automatically updated to reflect the committed changes.