Page MenuHomePhabricator

MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-info
ClosedPublic

Authored by appcs on Dec 22 2016, 7:41 PM.

Details

Summary

Under -mergefunc-preserve-debug-info MergeFunctions::writeThunk():

  • Does not create a new function for the thunk.
  • Retains the debug info (and associated instructions) in the entry block for the thunk's parameters. PS: -debug will display the algorithm at work.
  • Creates debug-info for the call (to the "master-copy") made by the thunk and its return value.
  • Does not mark the call made by the thunk as a tail-call, so that the backtrace indicates the execution flow at -O0.
  • Erases the rest of the function, retaining the (minimally sized) entry block.

GDB (7.11.1) features verified to work at [-O0, -O2]:
(gdb) info functions # Thunked function will now be listed.
(gdb) step <thunked function> # Reaches the "master-copy" function that the thunk calls.
(gdb) backtrace # When inside the thunk, will show call chain leading up to the thunk

    1. When we step into the master-copy function called by the thunk, will
    2. show call chain leading upto the master-copy (including thunk, one-above, at -O0)
    3. but PS: at -O<non-zero> the call does get tail-call'optimized, so will not show, as generally. (gdb) finish # When inside the thunk and when inside the master-copy when called from the thunk (gdb) break # On thunked function
  1. On master-copy function, breaks on newly inserted call made from within thunk
  2. On call-site of thunked function in another translation unit (not the TU of its definition)

Diff Detail

Repository
rL LLVM

Event Timeline

appcs updated this revision to Diff 82397.Dec 22 2016, 7:41 PM
appcs retitled this revision from to MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-info.
appcs updated this object.
appcs added reviewers: echristo, aprantl, friss, eeckstein.
appcs added a subscriber: llvm-commits.
fhahn added a subscriber: fhahn.Dec 26 2016, 1:55 PM

Thanks for your patch! It's not my area of expertise but I think the size of the test case could be reduced quite a bit.

test/Transforms/MergeFunc/thunk-debugability.ll
23 ↗(On Diff #82397)

Could the size of the dotSumA an dotSumA be reduced and still illustrate the problem?

225 ↗(On Diff #82397)

Could the function here just take a, b and c as parameters to make it simple? Also, do we need the debug intrinsics here?

259 ↗(On Diff #82397)

Can we get rid of this?

261 ↗(On Diff #82397)

I think you don't need attributes for the test case.

269 ↗(On Diff #82397)

You should be able to get rid of most of the debug metadata I think.

435 ↗(On Diff #82397)

I think people tend to stick that at the top of the file (after the RUN lines)

476 ↗(On Diff #82397)

It looks like this is not part of the test file.

aprantl edited edge metadata.Jan 3 2017, 10:23 AM

Thank you for working on this! Below are a couple of question to help my understanding:

Under -mergefunc-preserve-debug-info MergeFunctions::writeThunk():>
Does not create a new function for the thunk.
Retains the debug info (and associated instructions) in the entry block for the thunk's parameters. PS: -debug will display the algorithm at work.

So if I understand correctly this patch causes debug info for the function arguments to be preserved in the thunk for the merged function. Will the call sites of the merged function point to the thunk or directly to the shared implementation? Or is the. thunk only there to be invoked from the expression evaluator and other translation units?

Creates debug-info for the call (to the "master-copy") made by the thunk and its return value.

So in the example in the testcase, if I'm stopped at y = dotSumB(c, a, b, 8); in the debugger I would see:

bt
dotSumA() // confusing
dotSumB() // the thunk with the debug info
main()

Does not mark the call made by the thunk as a tail-call, so that the backtrace indicates the execution flow at -O0.

  • Is compiling with function merging and -O0 a combination used by anyone in practice?
  • Wouldn't it be *better* to use a tail-call in the thunk, so we don't see the misleading debug info for dotSumA()?

(I think you gave the answer for this later on: it will get auto-converted to a tail call, but it never hurst to clarify).

Adrian

lib/Transforms/IPO/MergeFunctions.cpp
234 ↗(On Diff #82397)

Nitpick: missing . (and remainder of sentence?) at the end.

test/Transforms/MergeFunc/thunk-debugability.ll
441 ↗(On Diff #82397)

Is all of this complexity necessary, or could the code be reduced further (e.g., but just doing the sum of p and eliminating q, etc...)?

appcs added inline comments.Jan 3 2017, 3:37 PM
lib/Transforms/IPO/MergeFunctions.cpp
699 ↗(On Diff #82397)

Noting that I'll change: "() {\n" -> "()\n" in the re-spin.

appcs added a comment.Jan 4 2017, 4:52 PM
Thank you for your review, Adrian.

True, under -mergefunc-preserve-debug-info the existing debug info (from the entry block) for the merged function's arguments is
preserved.

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared
implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk
for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing
behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

So, given two source files, thunk-debugability.c & thunk-debugability-aux.c with content as follows:

thunk-debugability.c {
      1 int sumA(int *a, int size) {
      2   int i, s;
      3   for (i = 0, s = 0; i < size; i++)
      4     s += a[i];
      5   return s;
      6 }
      7
      8 int sumB(int *a, int size) {
      9   int i, s;
     10   for (i = 0, s = 0; i < size; i++)
     11     s += a[i];
     12   return s;
     13 }
     14
     15 extern int sumExternalTest(int *p, int size);
     16
     17 int main(void) {
     18
     19   int a[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
     20
     21   int x, y;
     22
     23   x = sumA(a, 8);
     24   y = sumB(a, 8);
     25   sumExternalTest(a, 8);
     26
     27   return (x == y) ? 0 : -1;
     28 }
}

thunk-debugability-aux.c {
      1 extern int sumA(int *a, int size);
      2 extern int sumB(int *a, int size);
      3
      4 int sumExternalTest(int *p, int size) {
      5
      6   int x, y;
      7
      8   x = sumA(p, 8);
      9   y = sumB(p, 8);
     10
     11   return (x == y ? x : -1);
     12 }
}

Under -mergefunc (i.e. existing behaviour) {
  sumA() [shared implementation] remains as per the definition given by the user
  sumB() [merged function] as defined by the user is erased and a new sumB() is created containing a tail call to sumA()
  The call made by main() to sumB() is replaced by a call to sumA()
  The call to sumB() made by the external caller sumExternalTest() will call the thunk sumB() and in turn, sumB() will tail call sumA()
}

Under -mergefunc -mergefunc-preserve-debug-info (i.e. new behaviour) {
  sumA() [shared implementation] remains as per the definition given by the user
  sumB() [merged function] as defined by the user is retained, but is transformed as follows: {
    The debug info for the arguments in the entry block is preserved
    A call is made to sumA() passing forward the arguments received by sumB()
    Debug info is created for the call to sumA() and its return value
    The call to sumA() is not marked as a tail call [PS: Needs elaboration/clarification]
    The rest of the CFG for sumB() is erased
  }
  The call made by main() to sumB() is replaced by a call to sumA()
  The call to sumB() made by the external caller sumExternalTest() will call the thunk sumB() and in turn, sumB() will call sumA()
}

The rationale for the new behaviour is that under -mergefunc-preserve-debug-info it is possible to step from the call site of the
merged function into its thunk and from there, into the shared implementation with the backtrace truly indicating the execution 
flow. Once that new flow (which is the essence of -mergefunc modulo the tail call, which is just optimization) is understood and
debugged, the user can recompile with -mergefunc-preserve-debug-info removed, if the need be.

With the above example source code (compiled at -O0 -mergefunc -mergefunc-preserve-debug-info):

Backtraces: {

  With call to sumA() not marked as a tail call in thunk sumB(): {

    Setting a breakpoint at a thunked function call site within the TU of its definition: {
      (gdb) break thunk-debugability.c:24
      Breakpoint 1 at 0x400609: file ./thunk-debugability.c, line 24.
      (gdb) run
      Starting program: /auto/compiler-migration/anmparal/code/upstreaming/MFI/thunk-debugability.mfig.exe

      Breakpoint 1, main () at ./thunk-debugability.c:24
      24        y = sumB(a, 8);
      (gdb) step
      sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      3         for (i = 0, s = 0; i < size; i++)
      (gdb) bt
      #0  sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      #1  0x000000000040060e in main () at ./thunk-debugability.c:24
      (gdb)
    }

    Setting a breakpoint at a thunked function call site outside the TU of its definition: {
      (gdb) break thunk-debugability-aux.c:9
      Breakpoint 2 at 0x400507: file ./thunk-debugability-aux.c, line 9.
      (gdb) run
      The program being debugged has been started already.
      Start it from the beginning? (y or n) y
      Starting program: /auto/compiler-migration/anmparal/code/upstreaming/MFI/thunk-debugability.mfig.exe

      Breakpoint 2, sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      9         y = sumB(p, 8);
      (gdb) step
      sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      8       int sumB(int *a, int size)
      (gdb) bt
      #0  sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      #1  0x0000000000400510 in sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      #2  0x000000000040061f in main () at ./thunk-debugability.c:25
      (gdb) step
      sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      3         for (i = 0, s = 0; i < size; i++)
      (gdb) bt
      #0  sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      #1  0x00000000004005a4 in sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      #2  0x0000000000400510 in sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      #3  0x000000000040061f in main () at ./thunk-debugability.c:25
      (gdb)
    }
  }

  With call to sumA() marked as a tail call in thunk sumB(): {

    Setting a breakpoint at a thunked function call site within the TU of its definition: {
      (gdb) break thunk-debugability.c:24
      Breakpoint 1 at 0x400609: file ./thunk-debugability.c, line 24.
      (gdb) run
      Starting program: /auto/compiler-migration/anmparal/code/upstreaming/MFI/thunk-debugability.mfig.exe

      Breakpoint 1, main () at ./thunk-debugability.c:24
      24        y = sumB(a, 8);
      (gdb) step
      sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      3         for (i = 0, s = 0; i < size; i++)
      (gdb) bt
      #0  sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      #1  0x000000000040060e in main () at ./thunk-debugability.c:24
      (gdb)
    }

    Setting a breakpoint at a thunked function call site outside the TU of its definition: {
      (gdb) break thunk-debugability-aux.c:9
      Breakpoint 2 at 0x400507: file ./thunk-debugability-aux.c, line 9.
      (gdb) run
      Starting program: /auto/compiler-migration/anmparal/code/upstreaming/MFI/thunk-debugability.mfig.exe

      Breakpoint 2, sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      9         y = sumB(p, 8);
      (gdb) step
      sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      8       int sumB(int *a, int size)
      (gdb) bt
      #0  sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      #1  0x0000000000400510 in sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      #2  0x000000000040061f in main () at ./thunk-debugability.c:25
      (gdb) step
      sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:1
      1       int sumA(int *a, int size)
      (gdb) bt
      #0  sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:1
      #1  0x0000000000400510 in sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      #2  0x000000000040061f in main () at ./thunk-debugability.c:25
      (gdb)
    }
  }
}

So, these two scenarios have identical behaviour: {
  With call to sumA() not marked as a tail call in thunk sumB() and setting a breakpoint at a thunked function call site within the TU of its definition.
  With call to sumA()     marked as a tail call in thunk sumB() and setting a breakpoint at a thunked function call site within the TU of its definition.
}

Whereas, in: {
  With call to sumA() not marked as a tail call in thunk sumB() and setting a breakpoint at a thunked function call site outside the TU of its definition.
  With call to sumA()     marked as a tail call in thunk sumB() and setting a breakpoint at a thunked function call site outside the TU of its definition.
}
- the behaviour differs: When the call to sumA() is not marked as a tail call, the execution flow is clear (from call site, to thunk,
to shared implementation)  but when the call to sumA() is marked as a tail call, note how across the step, sumB() is ("suddenly")
replaced by sumA()
Which behaviour do we feel is better for the debug experience?

I need to clarify: It is not true that the call to sumA() will be auto-converted into a tail call at -O<non-zero>
I apologize for making a misleading statement; I did not realize that I was passing an already -O<non-zero>'ed LLVM IR file to
opt -O<non-zero> -mergefunc -mergefunc-preserve-debug-info

So, do we want to conditionally tail call the shared implementation?
[debug]      At -O0          -mergefunc -mergefunc-preserve-debug-info - it is an actual call
[production] At -O<non-zero> -mergefunc -mergefunc-preserve-debug-info - it is a  tail   call
- or do we just unconditionally keep the tail call, as-(currently)-is?

--

Thank you and @fhahn for the review; Please let me know which way we want to resolve the question of the tail call and
I'll re-spin, with the test reworked with the above (keeping a single TU) example code, incorporating the other feedback.

Anmol.

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

Is this the right thing to do, though? I would think that for better debugability users would always prefer the version with the thunk even in the current TU. Then again, subsequent optimization passes would likely inline the thunk and thus potentially undo most of the effect. Have you experimented with this variant? Would the debug info from the thunk survive?

appcs added a comment.Jan 5 2017, 6:21 PM

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

Is this the right thing to do, though? I would think that for better debugability users would always prefer the version with the thunk even in the current TU. Then again, subsequent optimization passes would likely inline the thunk and thus potentially undo most of the effect. Have you experimented with this variant? Would the debug info from the thunk survive?

Thank you; I agree that this greatly enhances the debug experience. Automatically modifying the call site with a direct call to the shared implementation instead of (what is now) the thunk, when within the TU, is confusing when debugging. Stepping into the thunk from the call site and from within the thunk, into the shared implementation keeps the debug experience uniform for internal and external call sites of functions that become thunks. (If you ask me, this is similar in spirit to not marking the thunk's call to the shared implementation as a tail call to help debugging - the full back trace is the same in both cases and otherwise, there is a certain element of surprise in both cases. Basically, we are saying that under -mergefunc-preserve-debug-info we trade off optimization partly (at -O0) to aid debugability, modifying the transformation slightly, to make the execution flow explicit). The same question remains for thunk call sites within the TU; do we do this:

[debug] At -O0 -mergefunc -mergefunc-preserve-debug-info - call thunk even though within the same TU
[production] At -O<non-zero> -mergefunc -mergefunc-preserve-debug-info - call shared implementation [as-(currently)-is]

MergeFunctions runs pretty much at the end of the sequence of optimization phases (only prior to Loop sinking and Instruction Simplify), so inlining has already happened at the call-sites.

I did verify that (when the thunk does not tail call the shared implementation), the preserved debug info survives even when merge-functions is done prior to inlining: {

{

 1 int sumA(int *a, int size) {
 2   int i, s;
 3   for (i = 0, s = 0; i < size; i++)
 4     s += a[i];
 5   return s;
 6 }
 7 
 8 int sumB(int *a, int size) {
 9   int i, s;
10   for (i = 0, s = 0; i < size; i++)
11     s += a[i];
12   return s;
13 }
14 
15 int main(void) {
16 
17   int a[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
18 
19   int x;
20 
21   x = sumB(a, 8);
22 
23   return 0;
24 }

}

define i32 @sumB(i32* %a, i32 %size) #0 !dbg !44 {

...// inlined code from sumA()
store i32* %a, i32** %a.addr, align 8
call void @llvm.dbg.declare(metadata i32** %a.addr, metadata !50, metadata !13), !dbg !51
store i32 %size, i32* %size.addr, align 4
call void @llvm.dbg.declare(metadata i32* %size.addr, metadata !52, metadata !13), !dbg !53
...// rest of inlined code from sumA()

}

define i32 @main() #0 !dbg !69 {

entry:
 ...// inlined code from sumB()
 %a.addr.i = alloca i32*, align 8
 call void @llvm.dbg.declare(metadata i32** %a.addr.i, metadata !50, metadata !13), !dbg !78
 %size.addr.i = alloca i32, align 4
 call void @llvm.dbg.declare(metadata i32* %size.addr.i, metadata !52, metadata !13), !dbg !79
 ...// rest of inlined code from sumB()

}

...

!50 = !DILocalVariable(name: "a", arg: 1, scope: !44, file: !7, line: 8, type: !11)
!51 = !DILocation(line: 8, column: 15, scope: !44)
!52 = !DILocalVariable(name: "size", arg: 2, scope: !44, file: !7, line: 8, type: !10)
!53 = !DILocation(line: 8, column: 22, scope: !44)
...
!78 = !DILocation(line: 8, column: 15, scope: !44, inlinedAt: !74)
!79 = !DILocation(line: 8, column: 22, scope: !44, inlinedAt: !74)

...

}

Also, when the thunk does tail call the shared implementation, the exact same LLVM IR as above is generated (i.e. as when the thunk does not tail call the shared implementation).

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

Is this the right thing to do, though? I would think that for better debugability users would always prefer the version with the thunk even in the current TU. Then again, subsequent optimization passes would likely inline the thunk and thus potentially undo most of the effect. Have you experimented with this variant? Would the debug info from the thunk survive?

Thank you; I agree that this greatly enhances the debug experience. Automatically modifying the call site with a direct call to the shared implementation instead of (what is now) the thunk, when within the TU, is confusing when debugging. Stepping into the thunk from the call site and from within the thunk, into the shared implementation keeps the debug experience uniform for internal and external call sites of functions that become thunks. (If you ask me, this is similar in spirit to not marking the thunk's call to the shared implementation as a tail call to help debugging - the full back trace is the same in both cases and otherwise, there is a certain element of surprise in both cases. Basically, we are saying that under -mergefunc-preserve-debug-info we trade off optimization partly (at -O0) to aid debugability, modifying the transformation slightly, to make the execution flow explicit).

Yes. I think this is the right approach. When using -mergefunc-preserve-debug-info we should always call the thunk even in the same TU.

The same question remains for thunk call sites within the TU; do we do this:

[debug] At -O0 -mergefunc -mergefunc-preserve-debug-info - call thunk even though within the same TU

Is this a scenario that anyone would use in practice, or did you just add that for comparison?

[production] At -O<non-zero> -mergefunc -mergefunc-preserve-debug-info - call shared implementation [as-(currently)-is]

MergeFunctions runs pretty much at the end of the sequence of optimization phases (only prior to Loop sinking and Instruction Simplify), so inlining has already happened at the call-sites.

Oh interesting. I (never having studied the PassManager before) was expecting the transformations to run in some kind of fix-point iteration.

I did verify that (when the thunk does not tail call the shared implementation), the preserved debug info survives even when merge-functions is done prior to inlining: {

{
     1 int sumA(int *a, int size) {
     2   int i, s;
     3   for (i = 0, s = 0; i < size; i++)
     4     s += a[i];
     5   return s;
     6 }
     7 
     8 int sumB(int *a, int size) {
     9   int i, s;
    10   for (i = 0, s = 0; i < size; i++)
    11     s += a[i];
    12   return s;
    13 }
    14 
    15 int main(void) {
    16 
    17   int a[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
    18 
    19   int x;
    20 
    21   x = sumB(a, 8);
    22 
    23   return 0;
    24 }
}
 
define i32 @sumB(i32* %a, i32 %size) #0 !dbg !44 {
  ...// inlined code from sumA()
  store i32* %a, i32** %a.addr, align 8
  call void @llvm.dbg.declare(metadata i32** %a.addr, metadata !50, metadata !13), !dbg !51
  store i32 %size, i32* %size.addr, align 4
  call void @llvm.dbg.declare(metadata i32* %size.addr, metadata !52, metadata !13), !dbg !53
  ...// rest of inlined code from sumA()
}
 
define i32 @main() #0 !dbg !69 {
 entry:
  ...// inlined code from sumB()
  %a.addr.i = alloca i32*, align 8
  call void @llvm.dbg.declare(metadata i32** %a.addr.i, metadata !50, metadata !13), !dbg !78
  %size.addr.i = alloca i32, align 4
  call void @llvm.dbg.declare(metadata i32* %size.addr.i, metadata !52, metadata !13), !dbg !79
  ...// rest of inlined code from sumB()
}
 
...
 
!50 = !DILocalVariable(name: "a", arg: 1, scope: !44, file: !7, line: 8, type: !11)
!51 = !DILocation(line: 8, column: 15, scope: !44)
!52 = !DILocalVariable(name: "size", arg: 2, scope: !44, file: !7, line: 8, type: !10)
!53 = !DILocation(line: 8, column: 22, scope: !44)
...
!78 = !DILocation(line: 8, column: 15, scope: !44, inlinedAt: !74)
!79 = !DILocation(line: 8, column: 22, scope: !44, inlinedAt: !74)
 ...

}

Also, when the thunk does tail call the shared implementation, the exact same LLVM IR as above is generated (i.e. as when the thunk does not tail call the shared implementation).

appcs added a comment.Jan 6 2017, 10:11 AM

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

Is this the right thing to do, though? I would think that for better debugability users would always prefer the version with the thunk even in the current TU. Then again, subsequent optimization passes would likely inline the thunk and thus potentially undo most of the effect. Have you experimented with this variant? Would the debug info from the thunk survive?

Thank you; I agree that this greatly enhances the debug experience. Automatically modifying the call site with a direct call to the shared implementation instead of (what is now) the thunk, when within the TU, is confusing when debugging. Stepping into the thunk from the call site and from within the thunk, into the shared implementation keeps the debug experience uniform for internal and external call sites of functions that become thunks. (If you ask me, this is similar in spirit to not marking the thunk's call to the shared implementation as a tail call to help debugging - the full back trace is the same in both cases and otherwise, there is a certain element of surprise in both cases. Basically, we are saying that under -mergefunc-preserve-debug-info we trade off optimization partly (at -O0) to aid debugability, modifying the transformation slightly, to make the execution flow explicit).

Yes. I think this is the right approach. When using -mergefunc-preserve-debug-info we should always call the thunk even in the same TU.

Thank you for confirming.

What should -mergefunc-preserve-debug-info do about the thunk's call to the shared implementation?
Mark as tail call or not? The existing -mergefunc behaviour is to mark it as a tail call. We could leave it as is,
unless someone specifically asks for a change under -mergefunc-preserve-debug-info; would that be OK?

The same question remains for thunk call sites within the TU; do we do this:

[debug] At -O0 -mergefunc -mergefunc-preserve-debug-info - call thunk even though within the same TU

Is this a scenario that anyone would use in practice, or did you just add that for comparison?

I have not added it yet; I'm thinking that for now, we'll not add the optimization level to the consideration under -mergefunc-preserve-debug-info unless some real user specifically asks for it. Does that sound reasonable?

[production] At -O<non-zero> -mergefunc -mergefunc-preserve-debug-info - call shared implementation [as-(currently)-is]

MergeFunctions runs pretty much at the end of the sequence of optimization phases (only prior to Loop sinking and Instruction Simplify), so inlining has already happened at the call-sites.

Oh interesting. I (never having studied the PassManager before) was expecting the transformations to run in some kind of fix-point iteration.

I was looking at lib/Transforms/IPO/PassManagerBuilder.cpp/PassManagerBuilder::populateModulePassManager()
and saw it listed immediately above those passes, but I was not aware of:
-debug-pass=Structure - print pass structure before run()
which lists Merge Functions running pretty much as one of the last passes at -O2, but the tail bit is:

Merge Functions                                                                   
Print module to stderr                                                            
FunctionPass Manager                                                              
  Module Verifier                                                                 
  Print function to stderr                                                        
Bitcode Writer

I found that http://llvm.org/docs/CommandGuide/opt.html says:
"The order in which the options occur on the command line are the order in which they are executed (within pass constraints)."
so, that is actually how I got -inline to run after -mergefunc for the experiment.

  • Thank you.
What should -mergefunc-preserve-debug-info do about the thunk's call to the shared implementation?
Mark as tail call or not? The existing -mergefunc behaviour is to mark it as a tail call. We could leave it as is,
unless someone specifically asks for a change under -mergefunc-preserve-debug-info; would that be OK?

I think marking it as a tail call makes most sense.

appcs added a comment.Jan 6 2017, 1:59 PM
What should -mergefunc-preserve-debug-info do about the thunk's call to the shared implementation?
Mark as tail call or not? The existing -mergefunc behaviour is to mark it as a tail call. We could leave it as is,
unless someone specifically asks for a change under -mergefunc-preserve-debug-info; would that be OK?

I think marking it as a tail call makes most sense.

ACK; thanks.

appcs updated this revision to Diff 83501.Jan 6 2017, 8:50 PM

Under -mergefunc-preserve-debug-info

  • A thunk's call site is preserved to point to the thunk when both occur within the same translation unit.

A thunk makes a tail call to the shared implementation (i.e. keep -mergefunc defined base behaviour).

Review comments have been incorporated; in particular, the test case has been reworked to:

  • Use smaller & simpler functions that illustrate the problem.
  • The debug intrinsics are minimal.
  • Unnecessary debug metadata has been removed.
  • Attributes have been removed.
appcs marked 2 inline comments as done.Jan 6 2017, 8:59 PM

Submitting ticked off items.

appcs added a comment.Jan 11 2017, 7:56 AM

Is this good?

aprantl added a subscriber: davide.Jan 12 2017, 9:51 AM

Added some more stylistic changes.

Do you have any plans for surfacing this in clang?

lib/Transforms/IPO/MergeFunctions.cpp
632 ↗(On Diff #83501)

http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments

Don’t duplicate the documentation comment in the header file and in the implementation file. Put the documentation comments for public APIs into the header file. Documentation comments for private APIs can go to the implementation file. In any case, implementation files can include additional comments (not necessarily in Doxygen markup) to explain implementation details as needed.

695 ↗(On Diff #83501)

if (DIS)

appcs added a comment.Jan 12 2017, 4:51 PM

Added some more stylistic changes.

Thank you; I will incorporate.

Do you have any plans for surfacing this in clang?

Did you mean add a clang option, like say: -fmerge-functions-preserve-debug-info
(Will also necessitate a prerequisite: -fmerge-functions) I'll submit these clang patches subsequently.
Please let me know of your suggestions for the names; I tend to go for really long names.
Thank you.

There are several options I could think of:
Either surfacing it through a driver option, enabling it at -Og, or benchmarking it and arguing to enable it by default at some optimization settings.

aprantl accepted this revision.Jan 13 2017, 9:37 AM
aprantl edited edge metadata.

One final request: there should be some documentation in the source code that explain what the trade-offs of enabling this option are, and what exactly it is doing.

Otherwise this looks good from my side.

This revision is now accepted and ready to land.Jan 13 2017, 9:37 AM
appcs updated this revision to Diff 84419.Jan 13 2017, 7:06 PM
appcs edited edge metadata.
  • Added documentation to source code on -mergefunc-preserve-debug-info behaviour and noted the difference from the base -mergefunc
  • Elaborated & consolidated comments in regards MergeFunctionsPDI for MergeFunctions::writeThunk()
  • Made suggested stylistic changes.

Is this OK to commit?

Thanks very much for all the inputs and review feedback.

appcs marked 2 inline comments as done.Jan 13 2017, 7:09 PM

There are several options I could think of:
Either surfacing it through a driver option, enabling it at -Og, or benchmarking it and arguing to enable it by default at some optimization settings.

-Og?

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

There are several options I could think of:
Either surfacing it through a driver option, enabling it at -Og, or benchmarking it and arguing to enable it by default at some optimization settings.

-Og?

Og is the new optimization level which is "optimize while preserving the best debug experience"

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

I don't see in this patch how is it supposed to trigger as I don't see an API to enable/disable this behavior (mergefunc-preserve-debug-info).
Also we need to think about LTO (which is likely a very good candidate for mergefunc, so should be considered high priority), and so embed this as a function attribute of some sort.

friss edited edge metadata.Jan 13 2017, 8:01 PM

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

While I understand the intent, I feel rather strongly against that. A compilation with or without -g should produce the same code.

I agree with @friss , there should no codegen differences with and without -g. There's been discussion on llvm-dev a while ago which came to the same conclusion (https://groups.google.com/forum/#!topic/llvm-dev/PEphavxH-Ok)

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

Note there is a strict rule that enabling debug info may not affect code generation and that (ideally) a stripped binary created with -h should be identical to a binary compiled without -g. If it isn't, that is considered a bug in LLVM.

Thank you, everybody, for the valuable inputs.

I understand the the codegen consistency [-g, -g0] requirement; I agree.

Would it be fair to say that -mergefunc-preserve-debug-info is more suited to being placed under –Og?

I did not follow the "embed as a function attribute" part; please could you elaborate?

I will submit a subsequent patch (upon this current base patch) that implements an API to enable/disable this behaviour
(-mergefunc-preserve-debug-info) in the near future. Would that be alright?

Is the current base patch (Diff 84419) OK to be committed?

There are several options I could think of:
Either surfacing it through a driver option, enabling it at -Og, or benchmarking it and arguing to enable it by default at some optimization settings.

-Og?

Og is the new optimization level which is "optimize while preserving the best debug experience"

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

I don't see in this patch how is it supposed to trigger as I don't see an API to enable/disable this behavior (mergefunc-preserve-debug-info).
Also we need to think about LTO (which is likely a very good candidate for mergefunc, so should be considered high priority), and so embed this as a function attribute of some sort.

@mehdi_amini, is this something you would like to see in this patch before it is committed, or in a follow-up patch?

@mehdi_amini, is this something you would like to see in this patch before it is committed, or in a follow-up patch?

The patch is good as is. I was just hinting about how to make it "usable" for users in the future.

@mehdi_amini, is this something you would like to see in this patch before it is committed, or in a follow-up patch?

The patch is good as is

-> I didn't mean I reviewed it.

@mehdi_amini, is this something you would like to see in this patch before it is committed, or in a follow-up patch?

The patch is good as is

-> I didn't mean I reviewed it.

That's fine, I had already signed off the algorithm some time ago. Integrating it into clang / the LTO pipeline can be a separate commit IMHO.
@appcs this is good to go, but there should be a separate discussion/review for enabling the new pass.

This revision was automatically updated to reflect the committed changes.