This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/trunk/
-
trunk/
-
lib/Transforms/IPO/
-
Transforms/
-
IPO/
-
MergeFunctions.cpp
-
test/Transforms/MergeFunc/
-
Transforms/
-
MergeFunc/
-
mergefunc-preserve-debug-info.ll

Differential D28075

MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-info
ClosedPublic

Authored by appcs on Dec 22 2016, 7:41 PM.

Download Raw Diff

Details

Reviewers

dblaikie
friss
eeckstein
echristo
aprantl

Commits

rG910dc8de3f39: MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve…
rL292702: MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve…

Summary

Under -mergefunc-preserve-debug-info MergeFunctions::writeThunk():

Does not create a new function for the thunk.
Retains the debug info (and associated instructions) in the entry block for the thunk's parameters. PS: -debug will display the algorithm at work.
Creates debug-info for the call (to the "master-copy") made by the thunk and its return value.
Does not mark the call made by the thunk as a tail-call, so that the backtrace indicates the execution flow at -O0.
Erases the rest of the function, retaining the (minimally sized) entry block.

GDB (7.11.1) features verified to work at [-O0, -O2]:
(gdb) info functions # Thunked function will now be listed.
(gdb) step <thunked function> # Reaches the "master-copy" function that the thunk calls.
(gdb) backtrace # When inside the thunk, will show call chain leading up to the thunk

1. When we step into the master-copy function called by the thunk, will
2. show call chain leading upto the master-copy (including thunk, one-above, at -O0)
3. but PS: at -O<non-zero> the call does get tail-call'optimized, so will not show, as generally. (gdb) finish # When inside the thunk and when inside the master-copy when called from the thunk (gdb) break # On thunked function
On master-copy function, breaks on newly inserted call made from within thunk
On call-site of thunked function in another translation unit (not the TU of its definition)

Diff Detail

Repository: rL LLVM

Event Timeline

appcs updated this revision to Diff 82397.Dec 22 2016, 7:41 PM

appcs retitled this revision from to MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-info.

appcs updated this object.

appcs added reviewers: echristo, aprantl, friss, eeckstein.

appcs added a subscriber: llvm-commits.

Herald added subscribers: mehdi_amini, jfb. · View Herald TranscriptDec 22 2016, 7:41 PM

Thanks for your patch! It's not my area of expertise but I think the size of the test case could be reduced quite a bit.

test/Transforms/MergeFunc/thunk-debugability.ll
23 ↗	(On Diff #82397)	Could the size of the dotSumA an dotSumA be reduced and still illustrate the problem?
225 ↗	(On Diff #82397)	Could the function here just take a, b and c as parameters to make it simple? Also, do we need the debug intrinsics here?
259 ↗	(On Diff #82397)	Can we get rid of this?
261 ↗	(On Diff #82397)	I think you don't need attributes for the test case.
269 ↗	(On Diff #82397)	You should be able to get rid of most of the debug metadata I think.
435 ↗	(On Diff #82397)	I think people tend to stick that at the top of the file (after the RUN lines)
476 ↗	(On Diff #82397)	It looks like this is not part of the test file.

Thank you for working on this! Below are a couple of question to help my understanding:

Under -mergefunc-preserve-debug-info MergeFunctions::writeThunk():>
Does not create a new function for the thunk.
Retains the debug info (and associated instructions) in the entry block for the thunk's parameters. PS: -debug will display the algorithm at work.

So if I understand correctly this patch causes debug info for the function arguments to be preserved in the thunk for the merged function. Will the call sites of the merged function point to the thunk or directly to the shared implementation? Or is the. thunk only there to be invoked from the expression evaluator and other translation units?

Creates debug-info for the call (to the "master-copy") made by the thunk and its return value.

So in the example in the testcase, if I'm stopped at y = dotSumB(c, a, b, 8); in the debugger I would see:

bt
dotSumA() // confusing
dotSumB() // the thunk with the debug info
main()

Does not mark the call made by the thunk as a tail-call, so that the backtrace indicates the execution flow at -O0.

Is compiling with function merging and -O0 a combination used by anyone in practice?
Wouldn't it be *better* to use a tail-call in the thunk, so we don't see the misleading debug info for dotSumA()?

(I think you gave the answer for this later on: it will get auto-converted to a tail call, but it never hurst to clarify).

Adrian

lib/Transforms/IPO/MergeFunctions.cpp
234 ↗	(On Diff #82397)	Nitpick: missing `.` (and remainder of sentence?) at the end.
test/Transforms/MergeFunc/thunk-debugability.ll
441 ↗	(On Diff #82397)	Is all of this complexity necessary, or could the code be reduced further (e.g., but just doing the sum of p and eliminating q, etc...)?

aprantl added a reviewer: dblaikie.Jan 3 2017, 10:24 AM

appcs added inline comments.Jan 3 2017, 3:37 PM

lib/Transforms/IPO/MergeFunctions.cpp
699 ↗	(On Diff #82397)	Noting that I'll change: "() {\n" -> "()\n" in the re-spin.

Thank you for your review, Adrian.

True, under -mergefunc-preserve-debug-info the existing debug info (from the entry block) for the merged function's arguments is
preserved.

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared
implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk
for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing
behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

So, given two source files, thunk-debugability.c & thunk-debugability-aux.c with content as follows:

thunk-debugability.c {
      1 int sumA(int *a, int size) {
      2   int i, s;
      3   for (i = 0, s = 0; i < size; i++)
      4     s += a[i];
      5   return s;
      6 }
      7
      8 int sumB(int *a, int size) {
      9   int i, s;
     10   for (i = 0, s = 0; i < size; i++)
     11     s += a[i];
     12   return s;
     13 }
     14
     15 extern int sumExternalTest(int *p, int size);
     16
     17 int main(void) {
     18
     19   int a[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
     20
     21   int x, y;
     22
     23   x = sumA(a, 8);
     24   y = sumB(a, 8);
     25   sumExternalTest(a, 8);
     26
     27   return (x == y) ? 0 : -1;
     28 }
}

thunk-debugability-aux.c {
      1 extern int sumA(int *a, int size);
      2 extern int sumB(int *a, int size);
      3
      4 int sumExternalTest(int *p, int size) {
      5
      6   int x, y;
      7
      8   x = sumA(p, 8);
      9   y = sumB(p, 8);
     10
     11   return (x == y ? x : -1);
     12 }
}

Under -mergefunc (i.e. existing behaviour) {
  sumA() [shared implementation] remains as per the definition given by the user
  sumB() [merged function] as defined by the user is erased and a new sumB() is created containing a tail call to sumA()
  The call made by main() to sumB() is replaced by a call to sumA()
  The call to sumB() made by the external caller sumExternalTest() will call the thunk sumB() and in turn, sumB() will tail call sumA()
}

Under -mergefunc -mergefunc-preserve-debug-info (i.e. new behaviour) {
  sumA() [shared implementation] remains as per the definition given by the user
  sumB() [merged function] as defined by the user is retained, but is transformed as follows: {
    The debug info for the arguments in the entry block is preserved
    A call is made to sumA() passing forward the arguments received by sumB()
    Debug info is created for the call to sumA() and its return value
    The call to sumA() is not marked as a tail call [PS: Needs elaboration/clarification]
    The rest of the CFG for sumB() is erased
  }
  The call made by main() to sumB() is replaced by a call to sumA()
  The call to sumB() made by the external caller sumExternalTest() will call the thunk sumB() and in turn, sumB() will call sumA()
}

The rationale for the new behaviour is that under -mergefunc-preserve-debug-info it is possible to step from the call site of the
merged function into its thunk and from there, into the shared implementation with the backtrace truly indicating the execution 
flow. Once that new flow (which is the essence of -mergefunc modulo the tail call, which is just optimization) is understood and
debugged, the user can recompile with -mergefunc-preserve-debug-info removed, if the need be.

With the above example source code (compiled at -O0 -mergefunc -mergefunc-preserve-debug-info):

Backtraces: {

  With call to sumA() not marked as a tail call in thunk sumB(): {

    Setting a breakpoint at a thunked function call site within the TU of its definition: {
      (gdb) break thunk-debugability.c:24
      Breakpoint 1 at 0x400609: file ./thunk-debugability.c, line 24.
      (gdb) run
      Starting program: /auto/compiler-migration/anmparal/code/upstreaming/MFI/thunk-debugability.mfig.exe

      Breakpoint 1, main () at ./thunk-debugability.c:24
      24        y = sumB(a, 8);
      (gdb) step
      sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      3         for (i = 0, s = 0; i < size; i++)
      (gdb) bt
      #0  sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      #1  0x000000000040060e in main () at ./thunk-debugability.c:24
      (gdb)
    }

    Setting a breakpoint at a thunked function call site outside the TU of its definition: {
      (gdb) break thunk-debugability-aux.c:9
      Breakpoint 2 at 0x400507: file ./thunk-debugability-aux.c, line 9.
      (gdb) run
      The program being debugged has been started already.
      Start it from the beginning? (y or n) y
      Starting program: /auto/compiler-migration/anmparal/code/upstreaming/MFI/thunk-debugability.mfig.exe

      Breakpoint 2, sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      9         y = sumB(p, 8);
      (gdb) step
      sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      8       int sumB(int *a, int size)
      (gdb) bt
      #0  sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      #1  0x0000000000400510 in sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      #2  0x000000000040061f in main () at ./thunk-debugability.c:25
      (gdb) step
      sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      3         for (i = 0, s = 0; i < size; i++)
      (gdb) bt
      #0  sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      #1  0x00000000004005a4 in sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      #2  0x0000000000400510 in sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      #3  0x000000000040061f in main () at ./thunk-debugability.c:25
      (gdb)
    }
  }

  With call to sumA() marked as a tail call in thunk sumB(): {

    Setting a breakpoint at a thunked function call site within the TU of its definition: {
      (gdb) break thunk-debugability.c:24
      Breakpoint 1 at 0x400609: file ./thunk-debugability.c, line 24.
      (gdb) run
      Starting program: /auto/compiler-migration/anmparal/code/upstreaming/MFI/thunk-debugability.mfig.exe

      Breakpoint 1, main () at ./thunk-debugability.c:24
      24        y = sumB(a, 8);
      (gdb) step
      sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      3         for (i = 0, s = 0; i < size; i++)
      (gdb) bt
      #0  sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:3
      #1  0x000000000040060e in main () at ./thunk-debugability.c:24
      (gdb)
    }

    Setting a breakpoint at a thunked function call site outside the TU of its definition: {
      (gdb) break thunk-debugability-aux.c:9
      Breakpoint 2 at 0x400507: file ./thunk-debugability-aux.c, line 9.
      (gdb) run
      Starting program: /auto/compiler-migration/anmparal/code/upstreaming/MFI/thunk-debugability.mfig.exe

      Breakpoint 2, sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      9         y = sumB(p, 8);
      (gdb) step
      sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      8       int sumB(int *a, int size)
      (gdb) bt
      #0  sumB (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:8
      #1  0x0000000000400510 in sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      #2  0x000000000040061f in main () at ./thunk-debugability.c:25
      (gdb) step
      sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:1
      1       int sumA(int *a, int size)
      (gdb) bt
      #0  sumA (a=0x7fffffffd6d0, size=8) at ./thunk-debugability.c:1
      #1  0x0000000000400510 in sumExternalTest (p=0x7fffffffd6d0, size=8) at ./thunk-debugability-aux.c:9
      #2  0x000000000040061f in main () at ./thunk-debugability.c:25
      (gdb)
    }
  }
}

So, these two scenarios have identical behaviour: {
  With call to sumA() not marked as a tail call in thunk sumB() and setting a breakpoint at a thunked function call site within the TU of its definition.
  With call to sumA()     marked as a tail call in thunk sumB() and setting a breakpoint at a thunked function call site within the TU of its definition.
}

Whereas, in: {
  With call to sumA() not marked as a tail call in thunk sumB() and setting a breakpoint at a thunked function call site outside the TU of its definition.
  With call to sumA()     marked as a tail call in thunk sumB() and setting a breakpoint at a thunked function call site outside the TU of its definition.
}
- the behaviour differs: When the call to sumA() is not marked as a tail call, the execution flow is clear (from call site, to thunk,
to shared implementation)  but when the call to sumA() is marked as a tail call, note how across the step, sumB() is ("suddenly")
replaced by sumA()
Which behaviour do we feel is better for the debug experience?

I need to clarify: It is not true that the call to sumA() will be auto-converted into a tail call at -O<non-zero>
I apologize for making a misleading statement; I did not realize that I was passing an already -O<non-zero>'ed LLVM IR file to
opt -O<non-zero> -mergefunc -mergefunc-preserve-debug-info

So, do we want to conditionally tail call the shared implementation?
[debug]      At -O0          -mergefunc -mergefunc-preserve-debug-info - it is an actual call
[production] At -O<non-zero> -mergefunc -mergefunc-preserve-debug-info - it is a  tail   call
- or do we just unconditionally keep the tail call, as-(currently)-is?

--

Thank you and @fhahn for the review; Please let me know which way we want to resolve the question of the tail call and
I'll re-spin, with the test reworked with the above (keeping a single TU) example code, incorporating the other feedback.

Anmol.

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

Is this the right thing to do, though? I would think that for better debugability users would always prefer the version with the thunk even in the current TU. Then again, subsequent optimization passes would likely inline the thunk and thus potentially undo most of the effect. Have you experimented with this variant? Would the debug info from the thunk survive?

In D28075#637188, @aprantl wrote:

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

Is this the right thing to do, though? I would think that for better debugability users would always prefer the version with the thunk even in the current TU. Then again, subsequent optimization passes would likely inline the thunk and thus potentially undo most of the effect. Have you experimented with this variant? Would the debug info from the thunk survive?

Thank you; I agree that this greatly enhances the debug experience. Automatically modifying the call site with a direct call to the shared implementation instead of (what is now) the thunk, when within the TU, is confusing when debugging. Stepping into the thunk from the call site and from within the thunk, into the shared implementation keeps the debug experience uniform for internal and external call sites of functions that become thunks. (If you ask me, this is similar in spirit to not marking the thunk's call to the shared implementation as a tail call to help debugging - the full back trace is the same in both cases and otherwise, there is a certain element of surprise in both cases. Basically, we are saying that under -mergefunc-preserve-debug-info we trade off optimization partly (at -O0) to aid debugability, modifying the transformation slightly, to make the execution flow explicit). The same question remains for thunk call sites within the TU; do we do this:

[debug] At -O0 -mergefunc -mergefunc-preserve-debug-info - call thunk even though within the same TU
[production] At -O<non-zero> -mergefunc -mergefunc-preserve-debug-info - call shared implementation [as-(currently)-is]

MergeFunctions runs pretty much at the end of the sequence of optimization phases (only prior to Loop sinking and Instruction Simplify), so inlining has already happened at the call-sites.

I did verify that (when the thunk does not tail call the shared implementation), the preserved debug info survives even when merge-functions is done prior to inlining: {

{

 1 int sumA(int *a, int size) {
 2   int i, s;
 3   for (i = 0, s = 0; i < size; i++)
 4     s += a[i];
 5   return s;
 6 }
 7 
 8 int sumB(int *a, int size) {
 9   int i, s;
10   for (i = 0, s = 0; i < size; i++)
11     s += a[i];
12   return s;
13 }
14 
15 int main(void) {
16 
17   int a[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
18 
19   int x;
20 
21   x = sumB(a, 8);
22 
23   return 0;
24 }

}

define i32 @sumB(i32* %a, i32 %size) #0 !dbg !44 {

...// inlined code from sumA()
store i32* %a, i32** %a.addr, align 8
call void @llvm.dbg.declare(metadata i32** %a.addr, metadata !50, metadata !13), !dbg !51
store i32 %size, i32* %size.addr, align 4
call void @llvm.dbg.declare(metadata i32* %size.addr, metadata !52, metadata !13), !dbg !53
...// rest of inlined code from sumA()

}

define i32 @main() #0 !dbg !69 {

entry:
 ...// inlined code from sumB()
 %a.addr.i = alloca i32*, align 8
 call void @llvm.dbg.declare(metadata i32** %a.addr.i, metadata !50, metadata !13), !dbg !78
 %size.addr.i = alloca i32, align 4
 call void @llvm.dbg.declare(metadata i32* %size.addr.i, metadata !52, metadata !13), !dbg !79
 ...// rest of inlined code from sumB()

}

...

!50 = !DILocalVariable(name: "a", arg: 1, scope: !44, file: !7, line: 8, type: !11)
!51 = !DILocation(line: 8, column: 15, scope: !44)
!52 = !DILocalVariable(name: "size", arg: 2, scope: !44, file: !7, line: 8, type: !10)
!53 = !DILocation(line: 8, column: 22, scope: !44)
...
!78 = !DILocation(line: 8, column: 15, scope: !44, inlinedAt: !74)
!79 = !DILocation(line: 8, column: 22, scope: !44, inlinedAt: !74)

...

}

Also, when the thunk does tail call the shared implementation, the exact same LLVM IR as above is generated (i.e. as when the thunk does not tail call the shared implementation).

In D28075#637667, @appcs wrote:

In D28075#637188, @aprantl wrote:

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

Is this the right thing to do, though? I would think that for better debugability users would always prefer the version with the thunk even in the current TU. Then again, subsequent optimization passes would likely inline the thunk and thus potentially undo most of the effect. Have you experimented with this variant? Would the debug info from the thunk survive?

Thank you; I agree that this greatly enhances the debug experience. Automatically modifying the call site with a direct call to the shared implementation instead of (what is now) the thunk, when within the TU, is confusing when debugging. Stepping into the thunk from the call site and from within the thunk, into the shared implementation keeps the debug experience uniform for internal and external call sites of functions that become thunks. (If you ask me, this is similar in spirit to not marking the thunk's call to the shared implementation as a tail call to help debugging - the full back trace is the same in both cases and otherwise, there is a certain element of surprise in both cases. Basically, we are saying that under -mergefunc-preserve-debug-info we trade off optimization partly (at -O0) to aid debugability, modifying the transformation slightly, to make the execution flow explicit).

Yes. I think this is the right approach. When using -mergefunc-preserve-debug-info we should always call the thunk even in the same TU.

The same question remains for thunk call sites within the TU; do we do this:

[debug] At -O0 -mergefunc -mergefunc-preserve-debug-info - call thunk even though within the same TU

Is this a scenario that anyone would use in practice, or did you just add that for comparison?

[production] At -O<non-zero> -mergefunc -mergefunc-preserve-debug-info - call shared implementation [as-(currently)-is]

MergeFunctions runs pretty much at the end of the sequence of optimization phases (only prior to Loop sinking and Instruction Simplify), so inlining has already happened at the call-sites.

Oh interesting. I (never having studied the PassManager before) was expecting the transformations to run in some kind of fix-point iteration.

I did verify that (when the thunk does not tail call the shared implementation), the preserved debug info survives even when merge-functions is done prior to inlining: {

{
 1 int sumA(int *a, int size) {
 2   int i, s;
 3   for (i = 0, s = 0; i < size; i++)
 4     s += a[i];
 5   return s;
 6 }
 7 
 8 int sumB(int *a, int size) {
 9   int i, s;
10   for (i = 0, s = 0; i < size; i++)
11     s += a[i];
12   return s;
13 }
14 
15 int main(void) {
16 
17   int a[8] = { 1, 2, 3, 4, 5, 6, 7, 8 };
18 
19   int x;
20 
21   x = sumB(a, 8);
22 
23   return 0;
24 }
}

define i32 @sumB(i32* %a, i32 %size) #0 !dbg !44 {
...// inlined code from sumA()
store i32* %a, i32** %a.addr, align 8
call void @llvm.dbg.declare(metadata i32** %a.addr, metadata !50, metadata !13), !dbg !51
store i32 %size, i32* %size.addr, align 4
call void @llvm.dbg.declare(metadata i32* %size.addr, metadata !52, metadata !13), !dbg !53
...// rest of inlined code from sumA()
}

define i32 @main() #0 !dbg !69 {
entry:
 ...// inlined code from sumB()
 %a.addr.i = alloca i32*, align 8
 call void @llvm.dbg.declare(metadata i32** %a.addr.i, metadata !50, metadata !13), !dbg !78
 %size.addr.i = alloca i32, align 4
 call void @llvm.dbg.declare(metadata i32* %size.addr.i, metadata !52, metadata !13), !dbg !79
 ...// rest of inlined code from sumB()
}

...

!50 = !DILocalVariable(name: "a", arg: 1, scope: !44, file: !7, line: 8, type: !11)
!51 = !DILocation(line: 8, column: 15, scope: !44)
!52 = !DILocalVariable(name: "size", arg: 2, scope: !44, file: !7, line: 8, type: !10)
!53 = !DILocation(line: 8, column: 22, scope: !44)
...
!78 = !DILocation(line: 8, column: 15, scope: !44, inlinedAt: !74)
!79 = !DILocation(line: 8, column: 22, scope: !44, inlinedAt: !74)
...
}

Also, when the thunk does tail call the shared implementation, the exact same LLVM IR as above is generated (i.e. as when the thunk does not tail call the shared implementation).

In D28075#637947, @aprantl wrote:

In D28075#637667, @appcs wrote:

In D28075#637188, @aprantl wrote:

Call sites of the merged function occurring from within the TU of the merged function's definition will point directly to the shared implementation and call sites of the merged function that are external to the TU of the merged function's definition call the thunk for the merged function (which tail call's the shared implementation passing forward the incoming arguments). This is existing behaviour under -mergefunc and remains unchanged under -mergefunc-preserve-debug-info (except for the tail call part).

Is this the right thing to do, though? I would think that for better debugability users would always prefer the version with the thunk even in the current TU. Then again, subsequent optimization passes would likely inline the thunk and thus potentially undo most of the effect. Have you experimented with this variant? Would the debug info from the thunk survive?

Thank you; I agree that this greatly enhances the debug experience. Automatically modifying the call site with a direct call to the shared implementation instead of (what is now) the thunk, when within the TU, is confusing when debugging. Stepping into the thunk from the call site and from within the thunk, into the shared implementation keeps the debug experience uniform for internal and external call sites of functions that become thunks. (If you ask me, this is similar in spirit to not marking the thunk's call to the shared implementation as a tail call to help debugging - the full back trace is the same in both cases and otherwise, there is a certain element of surprise in both cases. Basically, we are saying that under -mergefunc-preserve-debug-info we trade off optimization partly (at -O0) to aid debugability, modifying the transformation slightly, to make the execution flow explicit).

Yes. I think this is the right approach. When using -mergefunc-preserve-debug-info we should always call the thunk even in the same TU.

Thank you for confirming.

What should -mergefunc-preserve-debug-info do about the thunk's call to the shared implementation?
Mark as tail call or not? The existing -mergefunc behaviour is to mark it as a tail call. We could leave it as is,
unless someone specifically asks for a change under -mergefunc-preserve-debug-info; would that be OK?

The same question remains for thunk call sites within the TU; do we do this:

[debug] At -O0 -mergefunc -mergefunc-preserve-debug-info - call thunk even though within the same TU

Is this a scenario that anyone would use in practice, or did you just add that for comparison?

I have not added it yet; I'm thinking that for now, we'll not add the optimization level to the consideration under -mergefunc-preserve-debug-info unless some real user specifically asks for it. Does that sound reasonable?

[production] At -O<non-zero> -mergefunc -mergefunc-preserve-debug-info - call shared implementation [as-(currently)-is]

MergeFunctions runs pretty much at the end of the sequence of optimization phases (only prior to Loop sinking and Instruction Simplify), so inlining has already happened at the call-sites.

Oh interesting. I (never having studied the PassManager before) was expecting the transformations to run in some kind of fix-point iteration.

I was looking at lib/Transforms/IPO/PassManagerBuilder.cpp/PassManagerBuilder::populateModulePassManager()
and saw it listed immediately above those passes, but I was not aware of:
-debug-pass=Structure - print pass structure before run()
which lists Merge Functions running pretty much as one of the last passes at -O2, but the tail bit is:

Merge Functions                                                                   
Print module to stderr                                                            
FunctionPass Manager                                                              
  Module Verifier                                                                 
  Print function to stderr                                                        
Bitcode Writer

I found that http://llvm.org/docs/CommandGuide/opt.html says:
"The order in which the options occur on the command line are the order in which they are executed (within pass constraints)."
so, that is actually how I got -inline to run after -mergefunc for the experiment.

Thank you.

In D28075#638014, @appcs wrote:

What should -mergefunc-preserve-debug-info do about the thunk's call to the shared implementation?
Mark as tail call or not? The existing -mergefunc behaviour is to mark it as a tail call. We could leave it as is,
unless someone specifically asks for a change under -mergefunc-preserve-debug-info; would that be OK?

I think marking it as a tail call makes most sense.

In D28075#638292, @aprantl wrote:

In D28075#638014, @appcs wrote:

What should -mergefunc-preserve-debug-info do about the thunk's call to the shared implementation?
Mark as tail call or not? The existing -mergefunc behaviour is to mark it as a tail call. We could leave it as is,
unless someone specifically asks for a change under -mergefunc-preserve-debug-info; would that be OK?

I think marking it as a tail call makes most sense.

ACK; thanks.

Under -mergefunc-preserve-debug-info

A thunk's call site is preserved to point to the thunk when both occur within the same translation unit.

A thunk makes a tail call to the shared implementation (i.e. keep -mergefunc defined base behaviour).

Review comments have been incorporated; in particular, the test case has been reworked to:

Use smaller & simpler functions that illustrate the problem.
The debug intrinsics are minimal.
Unnecessary debug metadata has been removed.
Attributes have been removed.

Submitting ticked off items.

Is this good?

aprantl added a subscriber: davide.Jan 12 2017, 9:51 AM

Added some more stylistic changes.

Do you have any plans for surfacing this in clang?

lib/Transforms/IPO/MergeFunctions.cpp
632 ↗	(On Diff #83501)	http://llvm.org/docs/CodingStandards.html#doxygen-use-in-documentation-comments Don’t duplicate the documentation comment in the header file and in the implementation file. Put the documentation comments for public APIs into the header file. Documentation comments for private APIs can go to the implementation file. In any case, implementation files can include additional comments (not necessarily in Doxygen markup) to explain implementation details as needed.
695 ↗	(On Diff #83501)	if (DIS)

In D28075#644203, @aprantl wrote:

Added some more stylistic changes.

Thank you; I will incorporate.

Do you have any plans for surfacing this in clang?

Did you mean add a clang option, like say: -fmerge-functions-preserve-debug-info
(Will also necessitate a prerequisite: -fmerge-functions) I'll submit these clang patches subsequently.
Please let me know of your suggestions for the names; I tend to go for really long names.
Thank you.

There are several options I could think of:
Either surfacing it through a driver option, enabling it at -Og, or benchmarking it and arguing to enable it by default at some optimization settings.

One final request: there should be some documentation in the source code that explain what the trade-offs of enabling this option are, and what exactly it is doing.

Otherwise this looks good from my side.

This revision is now accepted and ready to land.Jan 13 2017, 9:37 AM

Added documentation to source code on -mergefunc-preserve-debug-info behaviour and noted the difference from the base -mergefunc
Elaborated & consolidated comments in regards MergeFunctionsPDI for MergeFunctions::writeThunk()
Made suggested stylistic changes.

Is this OK to commit?

Thanks very much for all the inputs and review feedback.

In D28075#645429, @aprantl wrote:

There are several options I could think of:
Either surfacing it through a driver option, enabling it at -Og, or benchmarking it and arguing to enable it by default at some optimization settings.

-Og?

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

In D28075#646196, @appcs wrote:

In D28075#645429, @aprantl wrote:

There are several options I could think of:
Either surfacing it through a driver option, enabling it at -Og, or benchmarking it and arguing to enable it by default at some optimization settings.

-Og?

Og is the new optimization level which is "optimize while preserving the best debug experience"

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

I don't see in this patch how is it supposed to trigger as I don't see an API to enable/disable this behavior (mergefunc-preserve-debug-info).
Also we need to think about LTO (which is likely a very good candidate for mergefunc, so should be considered high priority), and so embed this as a function attribute of some sort.

In D28075#646196, @appcs wrote:

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

While I understand the intent, I feel rather strongly against that. A compilation with or without -g should produce the same code.

I agree with @friss , there should no codegen differences with and without -g. There's been discussion on llvm-dev a while ago which came to the same conclusion (https://groups.google.com/forum/#!topic/llvm-dev/PEphavxH-Ok)

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

Note there is a strict rule that enabling debug info may not affect code generation and that (ideally) a stripped binary created with -h should be identical to a binary compiled without -g. If it isn't, that is considered a bug in LLVM.

Thank you, everybody, for the valuable inputs.

I understand the the codegen consistency [-g, -g0] requirement; I agree.

Would it be fair to say that -mergefunc-preserve-debug-info is more suited to being placed under –Og?

I did not follow the "embed as a function attribute" part; please could you elaborate?

I will submit a subsequent patch (upon this current base patch) that implements an API to enable/disable this behaviour
(-mergefunc-preserve-debug-info) in the near future. Would that be alright?

Is the current base patch (Diff 84419) OK to be committed?

In D28075#646233, @mehdi_amini wrote:

In D28075#646196, @appcs wrote:

In D28075#645429, @aprantl wrote:

There are several options I could think of:
Either surfacing it through a driver option, enabling it at -Og, or benchmarking it and arguing to enable it by default at some optimization settings.

-Og?

Og is the new optimization level which is "optimize while preserving the best debug experience"

I feel that -g ought to imply -mergefunc-preserve-debug-info - would you agree?

I don't see in this patch how is it supposed to trigger as I don't see an API to enable/disable this behavior (mergefunc-preserve-debug-info).
Also we need to think about LTO (which is likely a very good candidate for mergefunc, so should be considered high priority), and so embed this as a function attribute of some sort.

@mehdi_amini, is this something you would like to see in this patch before it is committed, or in a follow-up patch?

In D28075#651966, @aprantl wrote:

@mehdi_amini, is this something you would like to see in this patch before it is committed, or in a follow-up patch?

The patch is good as is. I was just hinting about how to make it "usable" for users in the future.

In D28075#652044, @mehdi_amini wrote:

In D28075#651966, @aprantl wrote:

@mehdi_amini, is this something you would like to see in this patch before it is committed, or in a follow-up patch?

The patch is good as is

-> I didn't mean I reviewed it.

In D28075#652048, @mehdi_amini wrote:

In D28075#652044, @mehdi_amini wrote:

In D28075#651966, @aprantl wrote:

@mehdi_amini, is this something you would like to see in this patch before it is committed, or in a follow-up patch?

The patch is good as is

-> I didn't mean I reviewed it.

That's fine, I had already signed off the algorithm some time ago. Integrating it into clang / the LTO pipeline can be a separate commit IMHO.
@appcs this is good to go, but there should be a separate discussion/review for enabling the new pass.

Closed by commit rL292702: MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve… (authored by anmol). · Explain WhyJan 20 2017, 6:13 PM

This revision was automatically updated to reflect the committed changes.

Revision Contents

Path

Size

llvm/

trunk/

lib/

Transforms/

IPO/

MergeFunctions.cpp

270 lines

test/

Transforms/

MergeFunc/

mergefunc-preserve-debug-info.ll

223 lines

Diff 85228

llvm/trunk/lib/Transforms/IPO/MergeFunctions.cpp

Show First 20 Lines • Show All 90 Lines • ▼ Show 20 Lines

#include "llvm/ADT/Hashing.h"		#include "llvm/ADT/Hashing.h"
#include "llvm/ADT/STLExtras.h"		#include "llvm/ADT/STLExtras.h"
#include "llvm/ADT/SmallSet.h"		#include "llvm/ADT/SmallSet.h"
#include "llvm/ADT/Statistic.h"		#include "llvm/ADT/Statistic.h"
#include "llvm/IR/CallSite.h"		#include "llvm/IR/CallSite.h"
#include "llvm/IR/Constants.h"		#include "llvm/IR/Constants.h"
#include "llvm/IR/DataLayout.h"		#include "llvm/IR/DataLayout.h"
		#include "llvm/IR/DebugInfo.h"
#include "llvm/IR/IRBuilder.h"		#include "llvm/IR/IRBuilder.h"
#include "llvm/IR/Instructions.h"		#include "llvm/IR/Instructions.h"
		#include "llvm/IR/IntrinsicInst.h"
#include "llvm/IR/LLVMContext.h"		#include "llvm/IR/LLVMContext.h"
#include "llvm/IR/Module.h"		#include "llvm/IR/Module.h"
#include "llvm/IR/ValueHandle.h"		#include "llvm/IR/ValueHandle.h"
#include "llvm/IR/ValueMap.h"		#include "llvm/IR/ValueMap.h"
#include "llvm/Pass.h"		#include "llvm/Pass.h"
#include "llvm/Support/CommandLine.h"		#include "llvm/Support/CommandLine.h"
#include "llvm/Support/Debug.h"		#include "llvm/Support/Debug.h"
#include "llvm/Support/ErrorHandling.h"		#include "llvm/Support/ErrorHandling.h"
Show All 13 Lines

static cl::opt<unsigned> NumFunctionsForSanityCheck(		static cl::opt<unsigned> NumFunctionsForSanityCheck(
"mergefunc-sanity",		"mergefunc-sanity",
cl::desc("How many functions in module could be used for "		cl::desc("How many functions in module could be used for "
"MergeFunctions pass sanity check. "		"MergeFunctions pass sanity check. "
"'0' disables this check. Works only with '-debug' key."),		"'0' disables this check. Works only with '-debug' key."),
cl::init(0), cl::Hidden);		cl::init(0), cl::Hidden);

		// Under option -mergefunc-preserve-debug-info we:
		// - Do not create a new function for a thunk.
		// - Retain the debug info for a thunk's parameters (and associated
		// instructions for the debug info) from the entry block.
		// Note: -debug will display the algorithm at work.
		// - Create debug-info for the call (to the shared implementation) made by
		// a thunk and its return value.
		// - Erase the rest of the function, retaining the (minimally sized) entry
		// block to create a thunk.
		// - Preserve a thunk's call site to point to the thunk even when both occur
		// within the same translation unit, to aid debugability. Note that this
		// behaviour differs from the underlying -mergefunc implementation which
		// modifies the thunk's call site to point to the shared implementation
		// when both occur within the same translation unit.
		static cl::opt<bool>
		MergeFunctionsPDI("mergefunc-preserve-debug-info", cl::Hidden,
		cl::init(false),
		cl::desc("Preserve debug info in thunk when mergefunc "
		"transformations are made."));

namespace {		namespace {

class FunctionNode {		class FunctionNode {
mutable AssertingVH<Function> F;		mutable AssertingVH<Function> F;
FunctionComparator::FunctionHash Hash;		FunctionComparator::FunctionHash Hash;
public:		public:
// Note the hash is recalculated potentially multiple times, but it is cheap.		// Note the hash is recalculated potentially multiple times, but it is cheap.
FunctionNode(Function *F)		FunctionNode(Function *F)
▲ Show 20 Lines • Show All 72 Lines • ▼ Show 20 Lines	private:
/// Merge two equivalent functions. Upon completion, G may be deleted, or may		/// Merge two equivalent functions. Upon completion, G may be deleted, or may
/// be converted into a thunk. In either case, it should never be visited		/// be converted into a thunk. In either case, it should never be visited
/// again.		/// again.
void mergeTwoFunctions(Function F, Function G);		void mergeTwoFunctions(Function F, Function G);

/// Replace G with a thunk or an alias to F. Deletes G.		/// Replace G with a thunk or an alias to F. Deletes G.
void writeThunkOrAlias(Function F, Function G);		void writeThunkOrAlias(Function F, Function G);

/// Replace G with a simple tail call to bitcast(F). Also replace direct uses		/// Fill PDIUnrelatedWL with instructions from the entry block that are
/// of G with bitcast(F). Deletes G.		/// unrelated to parameter related debug info.
		void filterInstsUnrelatedToPDI(BasicBlock *GEntryBlock,
		std::vector<Instruction *> &PDIUnrelatedWL);

		/// Erase the rest of the CFG (i.e. barring the entry block).
		void eraseTail(Function *G);

		/// Erase the instructions in PDIUnrelatedWL as they are unrelated to the
		/// parameter debug info, from the entry block.
		void eraseInstsUnrelatedToPDI(std::vector<Instruction *> &PDIUnrelatedWL);

		/// Replace G with a simple tail call to bitcast(F). Also (unless
		/// MergeFunctionsPDI holds) replace direct uses of G with bitcast(F),
		/// delete G.
void writeThunk(Function F, Function G);		void writeThunk(Function F, Function G);

/// Replace G with an alias to F. Deletes G.		/// Replace G with an alias to F. Deletes G.
void writeAlias(Function F, Function G);		void writeAlias(Function F, Function G);

/// Replace function F with function G in the function tree.		/// Replace function F with function G in the function tree.
void replaceFunctionInTree(const FunctionNode &FN, Function *G);		void replaceFunctionInTree(const FunctionNode &FN, Function *G);

▲ Show 20 Lines • Show All 228 Lines • ▼ Show 20 Lines	static Value createCast(IRBuilder<> &Builder, Value V, Type *DestTy) {
if (SrcTy->isIntegerTy() && DestTy->isPointerTy())		if (SrcTy->isIntegerTy() && DestTy->isPointerTy())
return Builder.CreateIntToPtr(V, DestTy);		return Builder.CreateIntToPtr(V, DestTy);
else if (SrcTy->isPointerTy() && DestTy->isIntegerTy())		else if (SrcTy->isPointerTy() && DestTy->isIntegerTy())
return Builder.CreatePtrToInt(V, DestTy);		return Builder.CreatePtrToInt(V, DestTy);
else		else
return Builder.CreateBitCast(V, DestTy);		return Builder.CreateBitCast(V, DestTy);
}		}

// Replace G with a simple tail call to bitcast(F). Also replace direct uses		// Erase the instructions in PDIUnrelatedWL as they are unrelated to the
// of G with bitcast(F). Deletes G.		// parameter debug info, from the entry block.
		void MergeFunctions::eraseInstsUnrelatedToPDI(
		std::vector<Instruction *> &PDIUnrelatedWL) {

		DEBUG(dbgs() << " Erasing instructions (in reverse order of appearance in "
		"entry block) unrelated to parameter debug info from entry "
		"block: {\n");
		while (!PDIUnrelatedWL.empty()) {
		Instruction *I = PDIUnrelatedWL.back();
		DEBUG(dbgs() << " Deleting Instruction: ");
		DEBUG(I->print(dbgs()));
		DEBUG(dbgs() << "\n");
		I->eraseFromParent();
		PDIUnrelatedWL.pop_back();
		}
		DEBUG(dbgs() << " } // Done erasing instructions unrelated to parameter "
		"debug info from entry block. \n");
		}

		// Reduce G to its entry block.
		void MergeFunctions::eraseTail(Function *G) {

		std::vector<BasicBlock *> WorklistBB;
		for (Function::iterator BBI = std::next(G->begin()), BBE = G->end();
		BBI != BBE; ++BBI) {
		BBI->dropAllReferences();
		WorklistBB.push_back(&*BBI);
		}
		while (!WorklistBB.empty()) {
		BasicBlock *BB = WorklistBB.back();
		BB->eraseFromParent();
		WorklistBB.pop_back();
		}
		}

		// We are interested in the following instructions from the entry block as being
		// related to parameter debug info:
		// - @llvm.dbg.declare
		// - stores from the incoming parameters to locations on the stack-frame
		// - allocas that create these locations on the stack-frame
		// - @llvm.dbg.value
		// - the entry block's terminator
		// The rest are unrelated to debug info for the parameters; fill up
		// PDIUnrelatedWL with such instructions.
		void MergeFunctions::filterInstsUnrelatedToPDI(
		BasicBlock GEntryBlock, std::vector<Instruction > &PDIUnrelatedWL) {

		std::set<Instruction *> PDIRelated;
		for (BasicBlock::iterator BI = GEntryBlock->begin(), BIE = GEntryBlock->end();
		BI != BIE; ++BI) {
		if (auto DVI = dyn_cast<DbgValueInst>(&BI)) {
		DEBUG(dbgs() << " Deciding: ");
		DEBUG(BI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		DILocalVariable *DILocVar = DVI->getVariable();
		if (DILocVar->isParameter()) {
		DEBUG(dbgs() << " Include (parameter): ");
		DEBUG(BI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		PDIRelated.insert(&*BI);
		} else {
		DEBUG(dbgs() << " Delete (!parameter): ");
		DEBUG(BI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		}
		} else if (auto DDI = dyn_cast<DbgDeclareInst>(&BI)) {
		DEBUG(dbgs() << " Deciding: ");
		DEBUG(BI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		DILocalVariable *DILocVar = DDI->getVariable();
		if (DILocVar->isParameter()) {
		DEBUG(dbgs() << " Parameter: ");
		DEBUG(DILocVar->print(dbgs()));
		AllocaInst *AI = dyn_cast_or_null<AllocaInst>(DDI->getAddress());
		if (AI) {
		DEBUG(dbgs() << " Processing alloca users: ");
		DEBUG(dbgs() << "\n");
		for (User *U : AI->users()) {
		if (StoreInst *SI = dyn_cast<StoreInst>(U)) {
		if (Value *Arg = SI->getValueOperand()) {
		if (dyn_cast<Argument>(Arg)) {
		DEBUG(dbgs() << " Include: ");
		DEBUG(AI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		PDIRelated.insert(AI);
		DEBUG(dbgs() << " Include (parameter): ");
		DEBUG(SI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		PDIRelated.insert(SI);
		DEBUG(dbgs() << " Include: ");
		DEBUG(BI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		PDIRelated.insert(&*BI);
		} else {
		DEBUG(dbgs() << " Delete (!parameter): ");
		DEBUG(SI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		}
		}
		} else {
		DEBUG(dbgs() << " Defer: ");
		DEBUG(U->print(dbgs()));
		DEBUG(dbgs() << "\n");
		}
		}
		} else {
		DEBUG(dbgs() << " Delete (alloca NULL): ");
		DEBUG(BI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		}
		} else {
		DEBUG(dbgs() << " Delete (!parameter): ");
		DEBUG(BI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		}
		} else if (dyn_cast<TerminatorInst>(BI) == GEntryBlock->getTerminator()) {
		DEBUG(dbgs() << " Will Include Terminator: ");
		DEBUG(BI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		PDIRelated.insert(&*BI);
		} else {
		DEBUG(dbgs() << " Defer: ");
		DEBUG(BI->print(dbgs()));
		DEBUG(dbgs() << "\n");
		}
		}
		DEBUG(dbgs()
		<< " Report parameter debug info related/related instructions: {\n");
		for (BasicBlock::iterator BI = GEntryBlock->begin(), BE = GEntryBlock->end();
		BI != BE; ++BI) {

		Instruction I = &BI;
		if (PDIRelated.find(I) == PDIRelated.end()) {
		DEBUG(dbgs() << " !PDIRelated: ");
		DEBUG(I->print(dbgs()));
		DEBUG(dbgs() << "\n");
		PDIUnrelatedWL.push_back(I);
		} else {
		DEBUG(dbgs() << " PDIRelated: ");
		DEBUG(I->print(dbgs()));
		DEBUG(dbgs() << "\n");
		}
		}
		DEBUG(dbgs() << " }\n");
		}

		// Replace G with a simple tail call to bitcast(F). Also (unless
		// MergeFunctionsPDI holds) replace direct uses of G with bitcast(F),
		// delete G. Under MergeFunctionsPDI, we use G itself for creating
		// the thunk as we preserve the debug info (and associated instructions)
		// from G's entry block pertaining to G's incoming arguments which are
		// passed on as corresponding arguments in the call that G makes to F.
		// For better debugability, under MergeFunctionsPDI, we do not modify G's
		// call sites to point to F even when within the same translation unit.
void MergeFunctions::writeThunk(Function F, Function G) {		void MergeFunctions::writeThunk(Function F, Function G) {
if (!G->isInterposable()) {		if (!G->isInterposable() && !MergeFunctionsPDI) {
// Redirect direct callers of G to F.		// Redirect direct callers of G to F. (See note on MergeFunctionsPDI
		// above).
replaceDirectCallers(G, F);		replaceDirectCallers(G, F);
}		}

// If G was internal then we may have replaced all uses of G with F. If so,		// If G was internal then we may have replaced all uses of G with F. If so,
// stop here and delete G. There's no need for a thunk.		// stop here and delete G. There's no need for a thunk. (See note on
if (G->hasLocalLinkage() && G->use_empty()) {		// MergeFunctionsPDI above).
		if (G->hasLocalLinkage() && G->use_empty() && !MergeFunctionsPDI) {
G->eraseFromParent();		G->eraseFromParent();
return;		return;
}		}

Function *NewG = Function::Create(G->getFunctionType(), G->getLinkage(), "",		BasicBlock *GEntryBlock = nullptr;
		std::vector<Instruction *> PDIUnrelatedWL;
		BasicBlock *BB = nullptr;
		Function *NewG = nullptr;
		if (MergeFunctionsPDI) {
		DEBUG(dbgs() << "writeThunk: (MergeFunctionsPDI) Do not create a new "
		"function as thunk; retain original: "
		<< G->getName() << "()\n");
		GEntryBlock = &G->getEntryBlock();
		DEBUG(dbgs() << "writeThunk: (MergeFunctionsPDI) filter parameter related "
		"debug info for "
		<< G->getName() << "() {\n");
		filterInstsUnrelatedToPDI(GEntryBlock, PDIUnrelatedWL);
		GEntryBlock->getTerminator()->eraseFromParent();
		BB = GEntryBlock;
		} else {
		NewG = Function::Create(G->getFunctionType(), G->getLinkage(), "",
G->getParent());		G->getParent());
BasicBlock *BB = BasicBlock::Create(F->getContext(), "", NewG);		BB = BasicBlock::Create(F->getContext(), "", NewG);
IRBuilder<> Builder(BB);		}

		IRBuilder<> Builder(BB);
		Function *H = MergeFunctionsPDI ? G : NewG;
SmallVector<Value *, 16> Args;		SmallVector<Value *, 16> Args;
unsigned i = 0;		unsigned i = 0;
FunctionType *FFTy = F->getFunctionType();		FunctionType *FFTy = F->getFunctionType();
for (Argument & AI : NewG->args()) {		for (Argument & AI : H->args()) {
Args.push_back(createCast(Builder, &AI, FFTy->getParamType(i)));		Args.push_back(createCast(Builder, &AI, FFTy->getParamType(i)));
++i;		++i;
}		}

CallInst *CI = Builder.CreateCall(F, Args);		CallInst *CI = Builder.CreateCall(F, Args);
		ReturnInst *RI = nullptr;
CI->setTailCall();		CI->setTailCall();
CI->setCallingConv(F->getCallingConv());		CI->setCallingConv(F->getCallingConv());
CI->setAttributes(F->getAttributes());		CI->setAttributes(F->getAttributes());
if (NewG->getReturnType()->isVoidTy()) {		if (H->getReturnType()->isVoidTy()) {
Builder.CreateRetVoid();		RI = Builder.CreateRetVoid();
} else {		} else {
Builder.CreateRet(createCast(Builder, CI, NewG->getReturnType()));		RI = Builder.CreateRet(createCast(Builder, CI, H->getReturnType()));
}		}

		if (MergeFunctionsPDI) {
		DISubprogram *DIS = G->getSubprogram();
		if (DIS) {
		DebugLoc CIDbgLoc = DebugLoc::get(DIS->getScopeLine(), 0, DIS);
		DebugLoc RIDbgLoc = DebugLoc::get(DIS->getScopeLine(), 0, DIS);
		CI->setDebugLoc(CIDbgLoc);
		RI->setDebugLoc(RIDbgLoc);
		} else {
		DEBUG(dbgs() << "writeThunk: (MergeFunctionsPDI) No DISubprogram for "
		<< G->getName() << "()\n");
		}
		eraseTail(G);
		eraseInstsUnrelatedToPDI(PDIUnrelatedWL);
		DEBUG(dbgs() << "} // End of parameter related debug info filtering for: "
		<< G->getName() << "()\n");
		} else {
NewG->copyAttributesFrom(G);		NewG->copyAttributesFrom(G);
NewG->takeName(G);		NewG->takeName(G);
removeUsers(G);		removeUsers(G);
G->replaceAllUsesWith(NewG);		G->replaceAllUsesWith(NewG);
G->eraseFromParent();		G->eraseFromParent();
		}

DEBUG(dbgs() << "writeThunk: " << NewG->getName() << '\n');		DEBUG(dbgs() << "writeThunk: " << H->getName() << '\n');
++NumThunksWritten;		++NumThunksWritten;
}		}

// Replace G with an alias to F and delete G.		// Replace G with an alias to F and delete G.
void MergeFunctions::writeAlias(Function F, Function G) {		void MergeFunctions::writeAlias(Function F, Function G) {
auto *GA = GlobalAlias::create(G->getLinkage(), "", F);		auto *GA = GlobalAlias::create(G->getLinkage(), "", F);
F->setAlignment(std::max(F->getAlignment(), G->getAlignment()));		F->setAlignment(std::max(F->getAlignment(), G->getAlignment()));
GA->takeName(G);		GA->takeName(G);
▲ Show 20 Lines • Show All 151 Lines • Show Last 20 Lines

llvm/trunk/test/Transforms/MergeFunc/mergefunc-preserve-debug-info.ll

				; RUN: opt -O0 -S -mergefunc -mergefunc-preserve-debug-info < %s \| FileCheck %s --check-prefix=OPTIMIZATION_LEVEL_0
				; RUN: opt -O2 -S -mergefunc -mergefunc-preserve-debug-info < %s \| FileCheck %s --check-prefix=OPTIMIZATION_LEVEL_2

				; Preserve debug info in thunks under -mergefunc -mergefunc-preserve-debug-info
				;
				; We test that:
				; At -O0 we have preserved the generated @llvm.dbg.declare debug intrinsics.
				; At -O2 we have preserved the generated @llvm.dbg.value debug intrinsics.
				; At -O0, stores from the incoming parameters to locations on the stack-frame
				; and allocas that create these locations on the stack-frame are preserved.
				; Debug info got generated for the call made by the thunk and for its return value.
				; The foregoing is the only content of a thunk's entry block.
				; A thunk makes a tail call to the shared implementation.
				; A thunk's call site is preserved to point to the thunk (with only -mergefunc the
				; call site is modified to point to the shared implementation) when both occur
				; within the same translation unit.

				; The source code that was used to test and generate this LLVM IR is:
				;
				; int maxA(int x, int y) {
				; int i, m, j;
				; if (x > y)
				; m = x;
				; else
				; m = y;
				; return m;
				; }
				;
				; int maxB(int x, int y) {
				; int i, m, j;
				; if (x > y)
				; m = x;
				; else
				; m = y;
				; return m;
				; }
				;
				; void f(void) {
				;
				; maxA(3, 4);
				; maxB(1, 9);
				; }

				; Function Attrs: nounwind uwtable
				define i32 @maxA(i32 %x, i32 %y) !dbg !6 {
				entry:
				%x.addr = alloca i32, align 4
				%y.addr = alloca i32, align 4
				%i = alloca i32, align 4
				%m = alloca i32, align 4
				%j = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				call void @llvm.dbg.declare(metadata i32* %x.addr, metadata !11, metadata !12), !dbg !13
				store i32 %y, i32* %y.addr, align 4
				call void @llvm.dbg.declare(metadata i32* %y.addr, metadata !14, metadata !12), !dbg !15
				call void @llvm.dbg.declare(metadata i32* %i, metadata !16, metadata !12), !dbg !17
				call void @llvm.dbg.declare(metadata i32* %m, metadata !18, metadata !12), !dbg !19
				call void @llvm.dbg.declare(metadata i32* %j, metadata !20, metadata !12), !dbg !21
				%0 = load i32, i32* %x.addr, align 4, !dbg !22
				%1 = load i32, i32* %y.addr, align 4, !dbg !24
				%cmp = icmp sgt i32 %0, %1, !dbg !25
				br i1 %cmp, label %if.then, label %if.else, !dbg !26

				if.then: ; preds = %entry
				%2 = load i32, i32* %x.addr, align 4, !dbg !27
				store i32 %2, i32* %m, align 4, !dbg !28
				br label %if.end, !dbg !29

				if.else: ; preds = %entry
				%3 = load i32, i32* %y.addr, align 4, !dbg !30
				store i32 %3, i32* %m, align 4, !dbg !31
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%4 = load i32, i32* %m, align 4, !dbg !32
				ret i32 %4, !dbg !33
				}

				; Function Attrs: nounwind readnone
				declare void @llvm.dbg.declare(metadata, metadata, metadata)

				; Function Attrs: nounwind uwtable
				define i32 @maxB(i32 %x, i32 %y) !dbg !34 {

				; OPTIMIZATION_LEVEL_0: define i32 @maxB(i32 %x, i32 %y)
				; OPTIMIZATION_LEVEL_0-NEXT: entry:
				; OPTIMIZATION_LEVEL_0-NEXT: %x.addr = alloca i32, align 4
				; OPTIMIZATION_LEVEL_0-NEXT: %y.addr = alloca i32, align 4
				; OPTIMIZATION_LEVEL_0-NEXT: store i32 %x, i32* %x.addr, align 4
				; OPTIMIZATION_LEVEL_0-NEXT: call void @llvm.dbg.declare(metadata i32* %x.addr, metadata !{{[0-9]+}}, metadata !{{[0-9]+}}), !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_0-NEXT: store i32 %y, i32* %y.addr, align 4
				; OPTIMIZATION_LEVEL_0-NEXT: call void @llvm.dbg.declare(metadata i32* %y.addr, metadata !{{[0-9]+}}, metadata !{{[0-9]+}}), !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_0-NEXT: %0 = tail call i32 @maxA(i32 %x, i32 %y), !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_0-NEXT: ret i32 %0, !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_0-NEXT: }

				; OPTIMIZATION_LEVEL_2: define i32 @maxB(i32 %x, i32 %y)
				; OPTIMIZATION_LEVEL_2-NEXT: entry:
				; OPTIMIZATION_LEVEL_2-NEXT: tail call void @llvm.dbg.value(metadata i32 %x, i64 0, metadata !{{[0-9]+}}, metadata !{{[0-9]+}}), !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_2-NEXT: tail call void @llvm.dbg.value(metadata i32 %y, i64 0, metadata !{{[0-9]+}}, metadata !{{[0-9]+}}), !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_2-NEXT: %0 = tail call i32 @maxA(i32 %x, i32 %y) #{{[0-9]+}}, !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_2-NEXT: ret i32 %0, !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_2-NEXT: }

				entry:
				%x.addr = alloca i32, align 4
				%y.addr = alloca i32, align 4
				%i = alloca i32, align 4
				%m = alloca i32, align 4
				%j = alloca i32, align 4
				store i32 %x, i32* %x.addr, align 4
				call void @llvm.dbg.declare(metadata i32* %x.addr, metadata !35, metadata !12), !dbg !36
				store i32 %y, i32* %y.addr, align 4
				call void @llvm.dbg.declare(metadata i32* %y.addr, metadata !37, metadata !12), !dbg !38
				call void @llvm.dbg.declare(metadata i32* %i, metadata !39, metadata !12), !dbg !40
				call void @llvm.dbg.declare(metadata i32* %m, metadata !41, metadata !12), !dbg !42
				call void @llvm.dbg.declare(metadata i32* %j, metadata !43, metadata !12), !dbg !44
				%0 = load i32, i32* %x.addr, align 4, !dbg !45
				%1 = load i32, i32* %y.addr, align 4, !dbg !47
				%cmp = icmp sgt i32 %0, %1, !dbg !48
				br i1 %cmp, label %if.then, label %if.else, !dbg !49

				if.then: ; preds = %entry
				%2 = load i32, i32* %x.addr, align 4, !dbg !50
				store i32 %2, i32* %m, align 4, !dbg !51
				br label %if.end, !dbg !52

				if.else: ; preds = %entry
				%3 = load i32, i32* %y.addr, align 4, !dbg !53
				store i32 %3, i32* %m, align 4, !dbg !54
				br label %if.end

				if.end: ; preds = %if.else, %if.then
				%4 = load i32, i32* %m, align 4, !dbg !55
				ret i32 %4, !dbg !56
				}

				; Function Attrs: nounwind uwtable
				define void @f() !dbg !57 {
				entry:

				; OPTIMIZATION_LEVEL_0: define void @f()
				; OPTIMIZATION_LEVEL_0-NEXT: entry:
				; OPTIMIZATION_LEVEL_0-NEXT: %call = call i32 @maxA(i32 3, i32 4), !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_0-NEXT: %call1 = call i32 @maxB(i32 1, i32 9), !dbg !{{[0-9]+}}
				; OPTIMIZATION_LEVEL_0-NEXT: ret void, !dbg !{{[0-9]+}}

				; OPTIMIZATION_LEVEL_2: define void @f()
				; OPTIMIZATION_LEVEL_2-NEXT: entry:
				; OPTIMIZATION_LEVEL_2-NEXT: ret void, !dbg !{{[0-9]+}}

				%call = call i32 @maxA(i32 3, i32 4), !dbg !60
				%call1 = call i32 @maxB(i32 1, i32 9), !dbg !61
				ret void, !dbg !62
				}

				!llvm.dbg.cu = !{!0}
				!llvm.module.flags = !{!3, !4}
				!llvm.ident = !{!5}

				!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2)
				!1 = !DIFile(filename: "mergefunc-preserve-debug-info.c", directory: "")
				!2 = !{}
				!3 = !{i32 2, !"Dwarf Version", i32 4}
				!4 = !{i32 2, !"Debug Info Version", i32 3}
				!5 = !{!""}
				!6 = distinct !DISubprogram(name: "maxA", scope: !7, file: !7, line: 1, type: !8, isLocal: false, isDefinition: true, scopeLine: 1, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
				!7 = !DIFile(filename: "./mergefunc-preserve-debug-info.c", directory: "")
				!8 = !DISubroutineType(types: !9)
				!9 = !{!10, !10, !10}
				!10 = !DIBasicType(name: "int", size: 32, encoding: DW_ATE_signed)
				!11 = !DILocalVariable(name: "x", arg: 1, scope: !6, file: !7, line: 1, type: !10)
				!12 = !DIExpression()
				!13 = !DILocation(line: 1, column: 14, scope: !6)
				!14 = !DILocalVariable(name: "y", arg: 2, scope: !6, file: !7, line: 1, type: !10)
				!15 = !DILocation(line: 1, column: 21, scope: !6)
				!16 = !DILocalVariable(name: "i", scope: !6, file: !7, line: 2, type: !10)
				!17 = !DILocation(line: 2, column: 7, scope: !6)
				!18 = !DILocalVariable(name: "m", scope: !6, file: !7, line: 2, type: !10)
				!19 = !DILocation(line: 2, column: 10, scope: !6)
				!20 = !DILocalVariable(name: "j", scope: !6, file: !7, line: 2, type: !10)
				!21 = !DILocation(line: 2, column: 13, scope: !6)
				!22 = !DILocation(line: 3, column: 7, scope: !23)
				!23 = distinct !DILexicalBlock(scope: !6, file: !7, line: 3, column: 7)
				!24 = !DILocation(line: 3, column: 11, scope: !23)
				!25 = !DILocation(line: 3, column: 9, scope: !23)
				!26 = !DILocation(line: 3, column: 7, scope: !6)
				!27 = !DILocation(line: 4, column: 9, scope: !23)
				!28 = !DILocation(line: 4, column: 7, scope: !23)
				!29 = !DILocation(line: 4, column: 5, scope: !23)
				!30 = !DILocation(line: 6, column: 9, scope: !23)
				!31 = !DILocation(line: 6, column: 7, scope: !23)
				!32 = !DILocation(line: 7, column: 10, scope: !6)
				!33 = !DILocation(line: 7, column: 3, scope: !6)
				!34 = distinct !DISubprogram(name: "maxB", scope: !7, file: !7, line: 10, type: !8, isLocal: false, isDefinition: true, scopeLine: 10, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
				!35 = !DILocalVariable(name: "x", arg: 1, scope: !34, file: !7, line: 10, type: !10)
				!36 = !DILocation(line: 10, column: 14, scope: !34)
				!37 = !DILocalVariable(name: "y", arg: 2, scope: !34, file: !7, line: 10, type: !10)
				!38 = !DILocation(line: 10, column: 21, scope: !34)
				!39 = !DILocalVariable(name: "i", scope: !34, file: !7, line: 11, type: !10)
				!40 = !DILocation(line: 11, column: 7, scope: !34)
				!41 = !DILocalVariable(name: "m", scope: !34, file: !7, line: 11, type: !10)
				!42 = !DILocation(line: 11, column: 10, scope: !34)
				!43 = !DILocalVariable(name: "j", scope: !34, file: !7, line: 11, type: !10)
				!44 = !DILocation(line: 11, column: 13, scope: !34)
				!45 = !DILocation(line: 12, column: 7, scope: !46)
				!46 = distinct !DILexicalBlock(scope: !34, file: !7, line: 12, column: 7)
				!47 = !DILocation(line: 12, column: 11, scope: !46)
				!48 = !DILocation(line: 12, column: 9, scope: !46)
				!49 = !DILocation(line: 12, column: 7, scope: !34)
				!50 = !DILocation(line: 13, column: 9, scope: !46)
				!51 = !DILocation(line: 13, column: 7, scope: !46)
				!52 = !DILocation(line: 13, column: 5, scope: !46)
				!53 = !DILocation(line: 15, column: 9, scope: !46)
				!54 = !DILocation(line: 15, column: 7, scope: !46)
				!55 = !DILocation(line: 16, column: 10, scope: !34)
				!56 = !DILocation(line: 16, column: 3, scope: !34)
				!57 = distinct !DISubprogram(name: "f", scope: !7, file: !7, line: 19, type: !58, isLocal: false, isDefinition: true, scopeLine: 19, flags: DIFlagPrototyped, isOptimized: false, unit: !0, variables: !2)
				!58 = !DISubroutineType(types: !59)
				!59 = !{null}
				!60 = !DILocation(line: 21, column: 3, scope: !57)
				!61 = !DILocation(line: 22, column: 3, scope: !57)
				!62 = !DILocation(line: 23, column: 1, scope: !57)

This is an archive of the discontinued LLVM Phabricator instance.

MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-infoClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 85228

llvm/trunk/lib/Transforms/IPO/MergeFunctions.cpp

llvm/trunk/test/Transforms/MergeFunc/mergefunc-preserve-debug-info.ll

MergeFunctions: Preserve debug info in thunks, under option -mergefunc-preserve-debug-info
ClosedPublic