This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
llvm/
-
include/llvm/DWARFLinker/
-
llvm/
-
DWARFLinker/
-
DWARFLinker.h
-
lib/
-
CodeGen/AsmPrinter/
-
AsmPrinter/
-
DwarfCompileUnit.h
-
DwarfCompileUnit.cpp
5/9
DwarfDebug.cpp
-
DWARFLinker/
-
DWARFLinker.cpp
-
test/
-
DebugInfo/
-
MIR/X86/
-
X86/
3
call-site-gnu-vs-dwarf5-attrs.mir
-
X86/
-
dwarf-callsite-related-attrs.ll
-
tools/dsymutil/X86/
-
dsymutil/
-
X86/
-
Inputs/
-
tail-call.cpp
-
tail-call.macho.x86_64
-
tail-call.macho.x86_64.o
-
tail-call-linking.test

Differential D76336

[DWARF] Emit DW_AT_call_pc for tail calls
ClosedPublic

Authored by vsk on Mar 17 2020, 6:05 PM.

Download Raw Diff

Details

Reviewers

aprantl
dblaikie
JDevlieghere
djtodoro

Commits

rG4d178b1d30eb: [DWARF] Emit DW_AT_call_pc for tail calls
rGf7052da6db8f: [DWARF] Emit DW_AT_call_pc for tail calls

Summary

Record the address of a tail-calling branch instruction within its call
site entry using DW_AT_call_pc. This allows a debugger to determine the
address to use when creating aritificial frames.

This creates an extra attribute + relocation at tail call sites, which
constitute 3-5% of all call sites in xnu/clang respectively.

rdar://60307600

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

vsk created this revision.Mar 17 2020, 6:05 PM

Herald added a subscriber: hiraditya. · View Herald TranscriptMar 17 2020, 6:05 PM

vsk added a child revision: D76337: [lldb/DWARF] Use DW_AT_call_pc to determine artificial frame address.Mar 17 2020, 6:08 PM

(general caveat: this should be committed as separate patches - between llvm and llvm-dsymutil (much like llvm-dwarfdump and llvm codegen are usually committed separately) - but yeah, it's nice to see it all together for review))

Do you happen to have numbers for, say, clang, on the growth of object size because of this - specifically I'd be interested in the growth of the .rela.debug_addr section in a DWARFv5 build.

Did I understand correctly from previous discussions that the goal was to have call_sites for /every/ call, at some point? (I'm concerned that'll be a /lot/ of addresses)

I suppose another question, I guess - how many more call_sites is this than currently/without this patch?

djtodoro added a subscriber: djtodoro.Mar 18 2020, 12:30 AM

djtodoro added inline comments.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
885–886	Looks like the `dwarf::DW_AT_call_return_pc` is enough for GDB to distinguish that even for tail calls. Can we do the same for LLDB and avoid the `call_pc`?
llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir
29	GCC produces this: <2><9a>: Abbrev Number: 9 (DW_TAG_GNU_call_site) <9b> DW_AT_low_pc : 0x23 <a3> DW_AT_GNU_tail_call: 1 <a3> DW_AT_abstract_origin: <0xd4>

djtodoro added inline comments.Mar 18 2020, 1:40 AM

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
1801	Should we use the `isCandidateForCallSiteEntry()` instead?

The dsymutil part LGTM.

In D76336#1928056, @dblaikie wrote:

(general caveat: this should be committed as separate patches - between llvm and llvm-dsymutil (much like llvm-dwarfdump and llvm codegen are usually committed separately) - but yeah, it's nice to see it all together for review))

Sounds good, I'll split this up before committing.

Do you happen to have numbers for, say, clang, on the growth of object size because of this - specifically I'd be interested in the growth of the .rela.debug_addr section in a DWARFv5 build.

I looked at the size impact on a Darwin -gdwarf-4 stage2 clang build. The aggregate size of .o's post-patch grows 0.006% (8367046264 bytes -> 8367544576 bytes, a 486KB increase). Tail calls make up ~5% of all calls (in clang at least), so this is in line with what was measured in D72489. In D72489, I added a relocation to every non-tail call site entry, which increased the aggregate .o size by 0.04% (~3MB).

I don't think linker support for dwarf5 is mature enough on Darwin for me to measure. Perhaps that measurement wouldn't be very useful, as ELF relocations are different.

Did I understand correctly from previous discussions that the goal was to have call_sites for /every/ call, at some point? (I'm concerned that'll be a /lot/ of addresses)

Yes. When tuning for lldb, we've been emitting call sites for every call for some time now.

Hm :(. Digging through the history, I see that I accidentally turned on DIFlagAllCallsDescribed when targeting -gdwarf-4 + GDB in D69743 (November 2019). This was done prematurely, I apologize for this. It should not have happened until the entry values feature got enabled by default. I've checked in test coverage for -gdwarf-4 + -ggdb to clang/test/CodeGenCXX/dbg-info-all-calls-described.cpp in 47622efc.

@dblaikie @djtodoro I'd prefer to leave things as they are, but am also open to disabling DIFlagAllCallsDescribed for -gdwarf-4 + -ggdb until entry values get re-enabled by default, let me know what you think.

I suppose another question, I guess - how many more call_sites is this than currently/without this patch?

This patch doesn't change the number of call site entries (DW_TAG_call_site), it just adds relocations to ~3-5% of those entries.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
885–886	I don't think that would be valid DWARF. The return pc for a tail call isn't given by the address of the instruction after after the tail-calling branch. Are you sure GDB uses DW_AT_call_return_pc to figure out the address where the tail call was made?
1801	Yes, thanks!
llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir
29	Is the DW_AT_low_pc the address of the tail-calling branch instruction, or is it the address of the next instruction?

Incorporate feedback from @djtodoro.

In D76336#1929994, @vsk wrote:

In D76336#1928056, @dblaikie wrote:

(general caveat: this should be committed as separate patches - between llvm and llvm-dsymutil (much like llvm-dwarfdump and llvm codegen are usually committed separately) - but yeah, it's nice to see it all together for review))

Sounds good, I'll split this up before committing.

Do you happen to have numbers for, say, clang, on the growth of object size because of this - specifically I'd be interested in the growth of the .rela.debug_addr section in a DWARFv5 build.

I looked at the size impact on a Darwin -gdwarf-4 stage2 clang build. The aggregate size of .o's post-patch grows 0.006% (8367046264 bytes -> 8367544576 bytes, a 486KB increase). Tail calls make up ~5% of all calls (in clang at least), so this is in line with what was measured in D72489. In D72489, I added a relocation to every non-tail call site entry, which increased the aggregate .o size by 0.04% (~3MB).

I don't think linker support for dwarf5 is mature enough on Darwin for me to measure. Perhaps that measurement wouldn't be very useful, as ELF relocations are different.

Right, an ELF measurement is what I was interested in, owing to the different relocations & .rela.debug_addr is especially significant to my use case involving split DWARF and compressed DWARF, so some of the largest remaining content is debug_addr relocations.

Did I understand correctly from previous discussions that the goal was to have call_sites for /every/ call, at some point? (I'm concerned that'll be a /lot/ of addresses)

Yes. When tuning for lldb, we've been emitting call sites for every call for some time now.

But they haven't had addresses on them?

Hmm, that doesn't seem to reproduce with a simple example:

void f1();
void f2() {
  f1();
}

Doesn't have any call_site (compiled with clang with -g -c for linux/x86 or darwin/x86).

Hm :(. Digging through the history, I see that I accidentally turned on DIFlagAllCallsDescribed when targeting -gdwarf-4 + GDB in D69743 (November 2019). This was done prematurely, I apologize for this. It should not have happened until the entry values feature got enabled by default. I've checked in test coverage for -gdwarf-4 + -ggdb to clang/test/CodeGenCXX/dbg-info-all-calls-described.cpp in 47622efc.

@dblaikie @djtodoro I'd prefer to leave things as they are, but am also open to disabling DIFlagAllCallsDescribed for -gdwarf-4 + -ggdb until entry values get re-enabled by default, let me know what you think.

Sure, happy to leave things as-is.

In D76336#1930066, @dblaikie wrote:

In D76336#1929994, @vsk wrote:

In D76336#1928056, @dblaikie wrote:

(general caveat: this should be committed as separate patches - between llvm and llvm-dsymutil (much like llvm-dwarfdump and llvm codegen are usually committed separately) - but yeah, it's nice to see it all together for review))

Sounds good, I'll split this up before committing.

Do you happen to have numbers for, say, clang, on the growth of object size because of this - specifically I'd be interested in the growth of the .rela.debug_addr section in a DWARFv5 build.

I looked at the size impact on a Darwin -gdwarf-4 stage2 clang build. The aggregate size of .o's post-patch grows 0.006% (8367046264 bytes -> 8367544576 bytes, a 486KB increase). Tail calls make up ~5% of all calls (in clang at least), so this is in line with what was measured in D72489. In D72489, I added a relocation to every non-tail call site entry, which increased the aggregate .o size by 0.04% (~3MB).

I don't think linker support for dwarf5 is mature enough on Darwin for me to measure. Perhaps that measurement wouldn't be very useful, as ELF relocations are different.

Right, an ELF measurement is what I was interested in, owing to the different relocations & .rela.debug_addr is especially significant to my use case involving split DWARF and compressed DWARF, so some of the largest remaining content is debug_addr relocations.

I'll find a linux system and share some numbers tomorrow.

Did I understand correctly from previous discussions that the goal was to have call_sites for /every/ call, at some point? (I'm concerned that'll be a /lot/ of addresses)

Yes. When tuning for lldb, we've been emitting call sites for every call for some time now.

But they haven't had addresses on them?

They have had addresses, DW_TAG_call_site may contain DW_AT_call_return_pc, which is an address as of D72489.

Hmm, that doesn't seem to reproduce with a simple example:
void f1();
void f2() {
  f1();
}
Doesn't have any call_site (compiled with clang with -g -c for linux/x86 or darwin/x86).

Have you tried adding '-O1 -disable-llvm-passes'? Call site entries aren't emitted unless optimization is enabled.

Hm :(. Digging through the history, I see that I accidentally turned on DIFlagAllCallsDescribed when targeting -gdwarf-4 + GDB in D69743 (November 2019). This was done prematurely, I apologize for this. It should not have happened until the entry values feature got enabled by default. I've checked in test coverage for -gdwarf-4 + -ggdb to clang/test/CodeGenCXX/dbg-info-all-calls-described.cpp in 47622efc.

@dblaikie @djtodoro I'd prefer to leave things as they are, but am also open to disabling DIFlagAllCallsDescribed for -gdwarf-4 + -ggdb until entry values get re-enabled by default, let me know what you think.

Sure, happy to leave things as-is.

In D76336#1930089, @vsk wrote:

In D76336#1930066, @dblaikie wrote:

In D76336#1929994, @vsk wrote:

In D76336#1928056, @dblaikie wrote:

(general caveat: this should be committed as separate patches - between llvm and llvm-dsymutil (much like llvm-dwarfdump and llvm codegen are usually committed separately) - but yeah, it's nice to see it all together for review))

Sounds good, I'll split this up before committing.

Do you happen to have numbers for, say, clang, on the growth of object size because of this - specifically I'd be interested in the growth of the .rela.debug_addr section in a DWARFv5 build.

I looked at the size impact on a Darwin -gdwarf-4 stage2 clang build. The aggregate size of .o's post-patch grows 0.006% (8367046264 bytes -> 8367544576 bytes, a 486KB increase). Tail calls make up ~5% of all calls (in clang at least), so this is in line with what was measured in D72489. In D72489, I added a relocation to every non-tail call site entry, which increased the aggregate .o size by 0.04% (~3MB).

I don't think linker support for dwarf5 is mature enough on Darwin for me to measure. Perhaps that measurement wouldn't be very useful, as ELF relocations are different.

Right, an ELF measurement is what I was interested in, owing to the different relocations & .rela.debug_addr is especially significant to my use case involving split DWARF and compressed DWARF, so some of the largest remaining content is debug_addr relocations.

I'll find a linux system and share some numbers tomorrow.

Thanks!

Did I understand correctly from previous discussions that the goal was to have call_sites for /every/ call, at some point? (I'm concerned that'll be a /lot/ of addresses)

Yes. When tuning for lldb, we've been emitting call sites for every call for some time now.

But they haven't had addresses on them?

They have had addresses, DW_TAG_call_site may contain DW_AT_call_return_pc, which is an address as of D72489.

"may"? or does in all cases? So this review is about changing some of those (specifically the ones on tail calls) call_return_pc to call_pc?

Hmm, that doesn't seem to reproduce with a simple example:
void f1();
void f2() {
  f1();
}
Doesn't have any call_site (compiled with clang with -g -c for linux/x86 or darwin/x86).
Have you tried adding '-O1 -disable-llvm-passes'? Call site entries aren't emitted unless optimization is enabled.

Ah, right right - thanks for the reminder. (why is this only relevant in optimized builds?) (sorry I keep repeating things with this feature - seems I still don't have it all clearly in my head :/)

@vsk Thanks for working on this!

@vsk wrote:
@dblaikie @djtodoro I'd prefer to leave things as they are, but am also open to disabling DIFlagAllCallsDescribed for -gdwarf-4 + -ggdb until entry values get re-enabled by default, let me know what you think.

I'd keep it as is, since I am preparing the patch for enabling the entry values by default, since the patch was not the cause of the issue reported.

@dblaikie wrote:
why is this only relevant in optimized builds?

I think the main benefit of the call-site information is using it together with call-site-parameters, used for computing the actual value of parameter, even the location of the parameter was not provided (optimized-out). That improves debugging user experience when debugging optimized code. In addition, in the case of tail calls, the call_site debug info is used for printing artificial call frames for the tail calls (and tail calls are typical to optimized code?).

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
885–886	GCC generates the `low_pc` for DWARF 4 (actually GNU extension) and for the DWARF 5 DW_AT_call_return_pc for the call_site TAGs describing the tail calls. That is the address after the jump-like instruction for a tail call. GDB understands that info and prints the artificial frames (the test case from the D76337): gcc -g -O2 tail.c -o gcc-tail-example ... (gdb) r Starting program: gcc-tail-example Breakpoint 1, sink () at tail.c:4 4 x++; (gdb) bt #0 sink () at tail.c:4 #1 0x00000000004004a6 in func3 () at tail.c:9 #2 0x00000000004004b7 in func2 () at tail.c:13 #3 0x00000000004004d6 in func1 () at tail.c:18 #4 0x0000000000400367 in main () at tail.c:22 ... //GDB version 8.3.5// //GCC version 4.9.3// The same happens when using the latest LLVM Trunk (with the D73534 applied), since the DWARF 4 + GDB generates the same DWARF as GCC for the example. clang -g -O2 tail.c -o tail-example ... (gdb) r Starting program: tail-example Breakpoint 1, sink () at tail.c:4 4 x++; (gdb) bt #0 sink () at tail.c:4 #1 0x0000000000401149 in func3 () at tail.c:9 // the DW_AT_low_pc is 0x401149 from the corresponding call_site #2 0x0000000000401156 in func2 () at tail.c:13 #3 0x0000000000401169 in func1 () at tail.c:18 #4 0x0000000000401176 in main () at tail.c:22 ...
llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir
29	As I mentioned above, it is the address after the tail-call instruction.

This patch causes a 0.7% size increase in rela.debug_addr in a stage2 -gdwarf-5 build on Linux. That's ~569KB, similar to the size increase seen on Darwin (see my earlier comment). Here are the steps I took to measure:

linux gdwarf-5 stage2 build.rtf3 KBDownload

I rented out a VPS for the day to gather the data -- if at all possible please lmk by EOD today if you're interested in a different experiment.

@dblaikie wrote:
why is this only relevant in optimized builds?

What @djtodoro said :). The features enabled by DW_TAG_call_site entries aren't useful/applicable at -O0. When call site entries /are/ useful, they always have to contain an address, otherwise the debugger can't figure out which call site entry to pick when stopped at a particular PC value.

Prior to this patch / currently, we only inserted an address into ~95% of call site entries (DW_AT_call_return_pc) -- the non-tail calling ones. We need the address of the tail-calling instruction, though, to make backtraces with artificial frames work properly.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
885–886	Thanks for digging into this! This seems like a really weird quirk of gcc/gdb. To make this work, the debugger has to know that the DW_AT_call_return_pc for tail calls is bogus, and then find+subtract the correct offset from that address to to find the true call_pc. It seems simpler to just emit the correct call_pc to begin with. In this patch, for the case where we're tuning for gdb in dwarf4 mode, I've omitted DW_AT_call_pc omission though (we'll keep emitting the strange DW_AT_call_return_pc -- I believe there's a test for this).

@dblaikie wrote:
why is this only relevant in optimized builds?

I think the main benefit of the call-site information is using it together with call-site-parameters, used for computing the actual value of parameter, even the location of the parameter was not provided (optimized-out).

That presents some interestingly tricky challenges. We have -fstandalone-debug for if you are building one part of a program with debug info an others without - but there's nothing equivalent for if you're building part of a program optimized but other parts unoptimized. And this heuristic (only emit these attributes for optimized code) assumes the caller and callee are both equally optimized/similarly compiled - which isn't necessarily true.

That improves debugging user experience when debugging optimized code. In addition, in the case of tail calls, the call_site debug info is used for printing artificial call frames for the tail calls (and tail calls are typical to optimized code?).

The tail call case is easier - since that's the caller-side. It'd probably be better to just emit that on any tail call, optimized or unoptimized code - I guess I mean, ideally the choice wouldn't be made at the frontend, but at the backend if the call ends up being a tail call.

(I'm thinking of LTO situations, attribute optnone, other things like that - the frontend doesn't really know if something is optimized or not & really you can't tell if a callee is optimized because it's in another translation unit)

But such is life/some of that has no obvious solution. Wonder what GCC does.

In D76336#1932313, @dblaikie wrote:

@dblaikie wrote:
why is this only relevant in optimized builds?

I think the main benefit of the call-site information is using it together with call-site-parameters, used for computing the actual value of parameter, even the location of the parameter was not provided (optimized-out).

That presents some interestingly tricky challenges. We have -fstandalone-debug for if you are building one part of a program with debug info an others without - but there's nothing equivalent for if you're building part of a program optimized but other parts unoptimized. And this heuristic (only emit these attributes for optimized code) assumes the caller and callee are both equally optimized/similarly compiled - which isn't necessarily true.

That's ok though, because a debugger can handle call site entries being only partially available. I.e. there's no reason (afaik) for mixing and matching optimized/unoptimized .o's to regress debugger features enabled by call site entries.

That improves debugging user experience when debugging optimized code. In addition, in the case of tail calls, the call_site debug info is used for printing artificial call frames for the tail calls (and tail calls are typical to optimized code?).

The tail call case is easier - since that's the caller-side. It'd probably be better to just emit that on any tail call, optimized or unoptimized code - I guess I mean, ideally the choice wouldn't be made at the frontend, but at the backend if the call ends up being a tail call.

We only want to emit call site entries at tail-calling sites when the caller has debug info, though. We do that today by relying on the DIFlagAllCallsDescribed attribute, which the frontend provides. Also fwiw the artificial frames debugger feature doesn't work if only the tail-calling sites are described -- all of the calls have to be described for the debugger to reconstruct feasible paths through the call graph.

(I'm thinking of LTO situations, attribute optnone, other things like that - the frontend doesn't really know if something is optimized or not & really you can't tell if a callee is optimized because it's in another translation unit)

Hm, oh, good point. But, 99+% of the time, isn't the workflow to compile with -flto=thin + -O{1,2,3,s,z}? We handle that fine. But I guess if you're doing -O0 -disable-O0-optnone -flto, you wouldn't get call site entries. Hrm. Does that come up much? I guess we could fix that by adding a frontend flag?

In D76336#1932330, @vsk wrote:

In D76336#1932313, @dblaikie wrote:

@dblaikie wrote:
why is this only relevant in optimized builds?

I think the main benefit of the call-site information is using it together with call-site-parameters, used for computing the actual value of parameter, even the location of the parameter was not provided (optimized-out).

That presents some interestingly tricky challenges. We have -fstandalone-debug for if you are building one part of a program with debug info an others without - but there's nothing equivalent for if you're building part of a program optimized but other parts unoptimized. And this heuristic (only emit these attributes for optimized code) assumes the caller and callee are both equally optimized/similarly compiled - which isn't necessarily true.

That's ok though, because a debugger can handle call site entries being only partially available. I.e. there's no reason (afaik) for mixing and matching optimized/unoptimized .o's to regress debugger features enabled by call site entries.

Ah, I wasn't suggesting it'd regress functionality - but that the heuristic of "only emit these for optimized code" was just that, a heuristic with some false negatives (or positives, or whichever way you think of it). Partly then asking the question: is there a more accurate way we could determine when to emit these attributes, rather than using a frontend is/isn't optimized heuristic.

That improves debugging user experience when debugging optimized code. In addition, in the case of tail calls, the call_site debug info is used for printing artificial call frames for the tail calls (and tail calls are typical to optimized code?).

The tail call case is easier - since that's the caller-side. It'd probably be better to just emit that on any tail call, optimized or unoptimized code - I guess I mean, ideally the choice wouldn't be made at the frontend, but at the backend if the call ends up being a tail call.

We only want to emit call site entries at tail-calling sites when the caller has debug info, though. We do that today by relying on the DIFlagAllCallsDescribed attribute, which the frontend provides.

"when the caller has debug info" - I'm probably misunderstanding what you mean there. Of course the caller has to have debug info (a DW_TAG_subprogram for the calling function) to describe the call site, since the call site tag goes inside the caller's subprogram tag.

What I meant, not in the tail-calling case, but in other call sites, the callee might be optimized but the caller might be unoptimized, so relying on "is the caller optimized" to decide whether to describe the call site misses some cases (caller is unoptimized, callee is optimized - it'd improve the user experience if that call (to the optimized function) had a call_site with call site parameters to help debuggability of the optimized function, if I understand correctly)

Also fwiw the artificial frames debugger feature doesn't work if only the tail-calling sites are described -- all of the calls have to be described for the debugger to reconstruct feasible paths through the call graph.

(I'm thinking of LTO situations, attribute optnone, other things like that - the frontend doesn't really know if something is optimized or not & really you can't tell if a callee is optimized because it's in another translation unit)

Hm, oh, good point. But, 99+% of the time, isn't the workflow to compile with -flto=thin + -O{1,2,3,s,z}? We handle that fine. But I guess if you're doing -O0 -disable-O0-optnone -flto, you wouldn't get call site entries. Hrm. Does that come up much? I guess we could fix that by adding a frontend flag?

I was just thinking straight -O0, or attribute((optnone)) - that could produce the "caller is unoptimized but callee is optimized, so the absence of a call_site TAG is a failure of the call_site TAG heuristic (a false negative)". Not the end of the world, and I'm not sure there's a better solution than the heuristic you've got, but just articulating a problem there. How much that comes up? well, as much as optnone/O0-preserved-through-LTO comes up, which was originally implemented for Sony - apparently their users use this as a debugging technique to maintain performance of the rest of the program while making parts of it more debuggable by usincg -O0.

In D76336#1932313, @dblaikie wrote:

@dblaikie wrote:

But such is life/some of that has no obvious solution. Wonder what GCC does.

Oh yes, I agree :) GCC does the same. It generates the info only in the case of "optimized" code...

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
885–886	DWARF5 + GDB tuning will have both call_pc and return_pc? I think we should ask here like this: `(IsTail && !useGNUAnalogForDwarf5Feature(DD))...`

Per @djtodoro's suggestion, avoid emitting DW_AT_call_pc when tuning for gdb. We already emit AT_call_return_pc in this case, which seems weird/wrong to me, but that's what gdb expects to see it seems.

There's a side discussion here about how to determine when to emit call site entries). I think that ought to be considered separately. It looks like there are compelling use cases for mixing -O0 & optimized code -- if there's enough interest in this, and the size overhead looks reasonable, we might consider turning on call site entries at -O0 (possibly gated behind a flag).

In D76336#1934599, @vsk wrote:

Per @djtodoro's suggestion, avoid emitting DW_AT_call_pc when tuning for gdb. We already emit AT_call_return_pc in this case, which seems weird/wrong to me, but that's what gdb expects to see it seems.

Does it work for GDB? If so, how? Please be sure there's clear documentation (probably in a somewhat long explanatory comment in the LLVM source itself) about why this divergence is justified - does GDB need this for providing some functionality, but less than is possible with DW_AT_call_pc, so LLDB wants that so it can provide the better experience, but providing only that would mean GDB would provide a worse experience than if it has return_pc?

In D76336#1934635, @dblaikie wrote:

In D76336#1934599, @vsk wrote:

Per @djtodoro's suggestion, avoid emitting DW_AT_call_pc when tuning for gdb. We already emit AT_call_return_pc in this case, which seems weird/wrong to me, but that's what gdb expects to see it seems.

Does it work for GDB?

Yes, @djtodoro's second-to-last comment shows an example where gdb works out the address for an artificial frame using the "return pc".

If so, how? Please be sure there's clear documentation (probably in a somewhat long explanatory comment in the LLVM source itself) about why this divergence is justified

I left a comment where DW_AT_call_pc is emitted explaining the situation: 'GDB prefers to work out what the call pc is by subtracting an offset from DW_AT_low_pc/DW_AT_call_return_pc'. I'll expand on this.

does GDB need this for providing some functionality, but less than is possible with DW_AT_call_pc, so LLDB wants that so it can provide the better experience, but providing only that would mean GDB would provide a worse experience than if it has return_pc?

gdb works backwards from non-standard usage of DW_AT_low_pc/DW_AT_call_return_pc at tail-call site entries to figure out the PC of tail-calling branch instructions. This means it doesn't need the compiler to emit DW_AT_call_pc, so we don't emit it (this change happened in my last patch update -- incidentally this means this patch no longer has any effect on debug info emission when tuning for gdb).

However, there isn't a good reason to tie non-gdb debuggers to this non-standardness, as it adds unnecessary complexity to the debugger. It forces the debugger to disassemble around the fake "return pc" to find the actual tail-calling branch. In the case where the tail-calling branch is the last instruction in the function, I'm not sure how that would work, as the fake "return pc" wouldn't necessarily point anywhere meaningful. To side-step that, we emit DW_AT_call_pc for non-gdb debuggers.

In D76336#1934726, @vsk wrote:

In D76336#1934635, @dblaikie wrote:

In D76336#1934599, @vsk wrote:

Per @djtodoro's suggestion, avoid emitting DW_AT_call_pc when tuning for gdb. We already emit AT_call_return_pc in this case, which seems weird/wrong to me, but that's what gdb expects to see it seems.

Does it work for GDB?

Yes, @djtodoro's second-to-last comment shows an example where gdb works out the address for an artificial frame using the "return pc".

If so, how? Please be sure there's clear documentation (probably in a somewhat long explanatory comment in the LLVM source itself) about why this divergence is justified

I left a comment where DW_AT_call_pc is emitted explaining the situation: 'GDB prefers to work out what the call pc is by subtracting an offset from DW_AT_low_pc/DW_AT_call_return_pc'. I'll expand on this.

Should we do that only in DWARFv4, and leave DWARFv5 standard & let GDB implement the missing functionality to understand the standard form?

does GDB need this for providing some functionality, but less than is possible with DW_AT_call_pc, so LLDB wants that so it can provide the better experience, but providing only that would mean GDB would provide a worse experience than if it has return_pc?

gdb works backwards from non-standard usage of DW_AT_low_pc/DW_AT_call_return_pc at tail-call site entries to figure out the PC of tail-calling branch instructions. This means it doesn't need the compiler to emit DW_AT_call_pc, so we don't emit it (this change happened in my last patch update -- incidentally this means this patch no longer has any effect on debug info emission when tuning for gdb).

The language in the DWARF spec is, as always, pretty vague/general. But, yes, the call_pc wording in the spec (which includes a mention of jumps/tail calls) seems to be in contrast to the call_return_pc that only mentions calls, so I get where you're coming from.

However, there isn't a good reason to tie non-gdb debuggers to this non-standardness, as it adds unnecessary complexity to the debugger. It forces the debugger to disassemble around the fake "return pc" to find the actual tail-calling branch. In the case where the tail-calling branch is the last instruction in the function, I'm not sure how that would work, as the fake "return pc" wouldn't necessarily point anywhere meaningful. To side-step that, we emit DW_AT_call_pc for non-gdb debuggers.

the "I'm not sure how that would work" bit is sort of concerning to me - it does work for GDB, right? (I'd guess it points to the end of the instruction (same as the "high_pc" of a function points to the end of (or, one passed the end in both cases) of the function))

I'm not going to/don't mean to hold any of this work up, just curious.

I guess if GDB does add support for call_pc, it'll likely keep its extension support of call_return_pc as well so it'll keep working with the non-standard emission for GDB you're proposing, and then we can fix-forward & remove this workaround. Has someone filed the bug on GDB for this missing functionality?

In D76336#1934743, @dblaikie wrote:

In D76336#1934726, @vsk wrote:

In D76336#1934635, @dblaikie wrote:

In D76336#1934599, @vsk wrote:

Per @djtodoro's suggestion, avoid emitting DW_AT_call_pc when tuning for gdb. We already emit AT_call_return_pc in this case, which seems weird/wrong to me, but that's what gdb expects to see it seems.

Does it work for GDB?

Yes, @djtodoro's second-to-last comment shows an example where gdb works out the address for an artificial frame using the "return pc".

If so, how? Please be sure there's clear documentation (probably in a somewhat long explanatory comment in the LLVM source itself) about why this divergence is justified

I left a comment where DW_AT_call_pc is emitted explaining the situation: 'GDB prefers to work out what the call pc is by subtracting an offset from DW_AT_low_pc/DW_AT_call_return_pc'. I'll expand on this.

Should we do that only in DWARFv4, and leave DWARFv5 standard & let GDB implement the missing functionality to understand the standard form?

Sure, I think this is a nice opportunity to reduce debugger-specific divergence. I'll include that in the next update.

does GDB need this for providing some functionality, but less than is possible with DW_AT_call_pc, so LLDB wants that so it can provide the better experience, but providing only that would mean GDB would provide a worse experience than if it has return_pc?

gdb works backwards from non-standard usage of DW_AT_low_pc/DW_AT_call_return_pc at tail-call site entries to figure out the PC of tail-calling branch instructions. This means it doesn't need the compiler to emit DW_AT_call_pc, so we don't emit it (this change happened in my last patch update -- incidentally this means this patch no longer has any effect on debug info emission when tuning for gdb).

The language in the DWARF spec is, as always, pretty vague/general. But, yes, the call_pc wording in the spec (which includes a mention of jumps/tail calls) seems to be in contrast to the call_return_pc that only mentions calls, so I get where you're coming from.

However, there isn't a good reason to tie non-gdb debuggers to this non-standardness, as it adds unnecessary complexity to the debugger. It forces the debugger to disassemble around the fake "return pc" to find the actual tail-calling branch. In the case where the tail-calling branch is the last instruction in the function, I'm not sure how that would work, as the fake "return pc" wouldn't necessarily point anywhere meaningful. To side-step that, we emit DW_AT_call_pc for non-gdb debuggers.

the "I'm not sure how that would work" bit is sort of concerning to me - it does work for GDB, right? (I'd guess it points to the end of the instruction (same as the "high_pc" of a function points to the end of (or, one passed the end in both cases) of the function))

I was hedging a bit because I haven't/can't read the gdb sources. I expect that the implementation does something to the effect of:

tail_call_pc = call_return_pc-1;
tail_call_pc -= sizeOfInstAt(tail_call_pc)

in which case handling a tail call at the end of a function shouldn't be problematic. Part of why I hedged is that I don't know what impact (if any) post-link function reordering tools might have on DW_AT_call_return_pc. Hypothetically, if the encoded address is one past the end of caller function, and the post-link tool fastidiously updates addresses in DWARF sections which point within moved functions, the non-standard trick gdb uses would stop working (otoh DW_AT_call_pc would get updated correctly).

I'm not going to/don't mean to hold any of this work up, just curious.

I guess if GDB does add support for call_pc, it'll likely keep its extension support of call_return_pc as well so it'll keep working with the non-standard emission for GDB you're proposing, and then we can fix-forward & remove this workaround. Has someone filed the bug on GDB for this missing functionality?

I did a quick scan of the gdb bug database but couldn't find anything relevant. If anyone following along can confirm gdb hasn't added support for DW_AT_call_pc, I'd appreciate it if you could file a bug.

Add more detailed comments, and always emit DW_AT_call_pc in DWARF5 mode (even when tuning for GDB).

does GDB need this for providing some functionality, but less than is possible with DW_AT_call_pc, so LLDB wants that so it can provide the better experience, but providing only that would mean GDB would provide a worse experience than if it has return_pc?

gdb works backwards from non-standard usage of DW_AT_low_pc/DW_AT_call_return_pc at tail-call site entries to figure out the PC of tail-calling branch instructions. This means it doesn't need the compiler to emit DW_AT_call_pc, so we don't emit it (this change happened in my last patch update -- incidentally this means this patch no longer has any effect on debug info emission when tuning for gdb).

The language in the DWARF spec is, as always, pretty vague/general. But, yes, the call_pc wording in the spec (which includes a mention of jumps/tail calls) seems to be in contrast to the call_return_pc that only mentions calls, so I get where you're coming from.

However, there isn't a good reason to tie non-gdb debuggers to this non-standardness, as it adds unnecessary complexity to the debugger. It forces the debugger to disassemble around the fake "return pc" to find the actual tail-calling branch. In the case where the tail-calling branch is the last instruction in the function, I'm not sure how that would work, as the fake "return pc" wouldn't necessarily point anywhere meaningful. To side-step that, we emit DW_AT_call_pc for non-gdb debuggers.

the "I'm not sure how that would work" bit is sort of concerning to me - it does work for GDB, right? (I'd guess it points to the end of the instruction (same as the "high_pc" of a function points to the end of (or, one passed the end in both cases) of the function))

I was hedging a bit because I haven't/can't read the gdb sources. I expect that the implementation does something to the effect of:
tail_call_pc = call_return_pc-1;
tail_call_pc -= sizeOfInstAt(tail_call_pc)
in which case handling a tail call at the end of a function shouldn't be problematic. Part of why I hedged is that I don't know what impact (if any) post-link function reordering tools might have on DW_AT_call_return_pc. Hypothetically, if the encoded address is one past the end of caller function, and the post-link tool fastidiously updates addresses in DWARF sections which point within moved functions, the non-standard trick gdb uses would stop working (otoh DW_AT_call_pc would get updated correctly).

An implementation couldn't resolve addresses like that, or it'd break the normal DW_AT_high_pc, which also points to the same address (one past the end of the function).

In D76336#1934828, @vsk wrote:

Add more detailed comments, and always emit DW_AT_call_pc in DWARF5 mode (even when tuning for GDB).

I like this. Actually, I was for this option.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
885–886	I still think we can avoid the PCAddr here as `(IsTail && !useGNUAnalogForDwarf5Feature(DD))`, if we want to use the call_pc for gdb + dwarf5 tunning? And also update the comment above.

Address review feedback.

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp
885–886	Ah right, sorry I missed this.

Thanks! lgtm (from my side)

This revision is now accepted and ready to land.Mar 24 2020, 7:40 AM

Closed by commit rGf7052da6db8f: [DWARF] Emit DW_AT_call_pc for tail calls (authored by vsk). · Explain WhyMar 24 2020, 12:22 PM

This revision was automatically updated to reflect the committed changes.

Herald added a project: Restricted Project. · View Herald TranscriptMar 24 2020, 12:22 PM

Revision Contents

Path

Size

llvm/

include/

llvm/

DWARFLinker/

DWARFLinker.h

3 lines

lib/

CodeGen/

AsmPrinter/

DwarfCompileUnit.h

8 lines

DwarfCompileUnit.cpp

33 lines

DwarfDebug.cpp

55 lines

DWARFLinker/

DWARFLinker.cpp

4 lines

test/

DebugInfo/

MIR/

X86/

call-site-gnu-vs-dwarf5-attrs.mir

23 lines

X86/

dwarf-callsite-related-attrs.ll

3 lines

tools/

dsymutil/

X86/

Inputs/

tail-call.cpp

28 lines

tail-call.macho.x86_64

tail-call.macho.x86_64.o

tail-call-linking.test

4 lines

Diff 252394

llvm/include/llvm/DWARFLinker/DWARFLinker.h

Show First 20 Lines • Show All 570 Lines • ▼ Show 20 Lines	struct AttributesInfo {
uint64_t OrigLowPc = std::numeric_limits<uint64_t>::max();		uint64_t OrigLowPc = std::numeric_limits<uint64_t>::max();

/// Value of AT_high_pc in the input DIE		/// Value of AT_high_pc in the input DIE
uint64_t OrigHighPc = 0;		uint64_t OrigHighPc = 0;

/// Value of DW_AT_call_return_pc in the input DIE		/// Value of DW_AT_call_return_pc in the input DIE
uint64_t OrigCallReturnPc = 0;		uint64_t OrigCallReturnPc = 0;

		/// Value of DW_AT_call_pc in the input DIE
		uint64_t OrigCallPc = 0;

/// Offset to apply to PC addresses inside a function.		/// Offset to apply to PC addresses inside a function.
int64_t PCOffset = 0;		int64_t PCOffset = 0;

/// Does the DIE have a low_pc attribute?		/// Does the DIE have a low_pc attribute?
bool HasLowPc = false;		bool HasLowPc = false;

/// Does the DIE have a ranges attribute?		/// Does the DIE have a ranges attribute?
bool HasRanges = false;		bool HasRanges = false;
▲ Show 20 Lines • Show All 202 Lines • Show Last 20 Lines

llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h

Show First 20 Lines • Show All 224 Lines • ▼ Show 20 Lines	public:
/// Construct a DIE for this subprogram scope.		/// Construct a DIE for this subprogram scope.
DIE &constructSubprogramScopeDIE(const DISubprogram *Sub,		DIE &constructSubprogramScopeDIE(const DISubprogram *Sub,
LexicalScope *Scope);		LexicalScope *Scope);

DIE createAndAddScopeChildren(LexicalScope Scope, DIE &ScopeDIE);		DIE createAndAddScopeChildren(LexicalScope Scope, DIE &ScopeDIE);

void constructAbstractSubprogramScopeDIE(LexicalScope *Scope);		void constructAbstractSubprogramScopeDIE(LexicalScope *Scope);

		/// Whether to use the GNU analog for a DWARF5 tag, attribute, or location
		/// atom. Only applicable when emitting otherwise DWARF4-compliant debug info.
		bool useGNUAnalogForDwarf5Feature() const;

/// This takes a DWARF 5 tag and returns it or a GNU analog.		/// This takes a DWARF 5 tag and returns it or a GNU analog.
dwarf::Tag getDwarf5OrGNUTag(dwarf::Tag Tag) const;		dwarf::Tag getDwarf5OrGNUTag(dwarf::Tag Tag) const;

/// This takes a DWARF 5 attribute and returns it or a GNU analog.		/// This takes a DWARF 5 attribute and returns it or a GNU analog.
dwarf::Attribute getDwarf5OrGNUAttr(dwarf::Attribute Attr) const;		dwarf::Attribute getDwarf5OrGNUAttr(dwarf::Attribute Attr) const;

/// This takes a DWARF 5 location atom and either returns it or a GNU analog.		/// This takes a DWARF 5 location atom and either returns it or a GNU analog.
dwarf::LocationAtom getDwarf5OrGNULocationAtom(dwarf::LocationAtom Loc) const;		dwarf::LocationAtom getDwarf5OrGNULocationAtom(dwarf::LocationAtom Loc) const;

/// Construct a call site entry DIE describing a call within \p Scope to a		/// Construct a call site entry DIE describing a call within \p Scope to a
/// callee described by \p CalleeDIE.		/// callee described by \p CalleeDIE.
/// \p CalleeDIE is a declaration or definition subprogram DIE for the callee.		/// \p CalleeDIE is a declaration or definition subprogram DIE for the callee.
/// For indirect calls \p CalleeDIE is set to nullptr.		/// For indirect calls \p CalleeDIE is set to nullptr.
/// \p IsTail specifies whether the call is a tail call.		/// \p IsTail specifies whether the call is a tail call.
/// \p PCAddr points to the PC value after the call instruction.		/// \p PCAddr points to the PC value after the call instruction.
		/// \p CallAddr points to the PC value at the call instruction (or is null).
/// \p CallReg is a register location for an indirect call. For direct calls		/// \p CallReg is a register location for an indirect call. For direct calls
/// the \p CallReg is set to 0.		/// the \p CallReg is set to 0.
DIE &constructCallSiteEntryDIE(DIE &ScopeDIE, DIE *CalleeDIE, bool IsTail,		DIE &constructCallSiteEntryDIE(DIE &ScopeDIE, DIE *CalleeDIE, bool IsTail,
const MCSymbol *PCAddr, unsigned CallReg);		const MCSymbol *PCAddr,
		const MCSymbol *CallAddr, unsigned CallReg);
/// Construct call site parameter DIEs for the \p CallSiteDIE. The \p Params		/// Construct call site parameter DIEs for the \p CallSiteDIE. The \p Params
/// were collected by the \ref collectCallSiteParameters.		/// were collected by the \ref collectCallSiteParameters.
/// Note: The order of parameters does not matter, since debuggers recognize		/// Note: The order of parameters does not matter, since debuggers recognize
/// call site parameters by the DW_AT_location attribute.		/// call site parameters by the DW_AT_location attribute.
void constructCallSiteParmEntryDIEs(DIE &CallSiteDIE,		void constructCallSiteParmEntryDIEs(DIE &CallSiteDIE,
SmallVector<DbgCallSiteParam, 4> &Params);		SmallVector<DbgCallSiteParam, 4> &Params);

/// Construct import_module DIE.		/// Construct import_module DIE.
▲ Show 20 Lines • Show All 102 Lines • Show Last 20 Lines

llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp

Show First 20 Lines • Show All 919 Lines • ▼ Show 20 Lines	void DwarfCompileUnit::constructAbstractSubprogramScopeDIE(
ContextCU->applySubprogramAttributesToDefinition(SP, *AbsDef);		ContextCU->applySubprogramAttributesToDefinition(SP, *AbsDef);

if (!ContextCU->includeMinimalInlineScopes())		if (!ContextCU->includeMinimalInlineScopes())
ContextCU->addUInt(*AbsDef, dwarf::DW_AT_inline, None, dwarf::DW_INL_inlined);		ContextCU->addUInt(*AbsDef, dwarf::DW_AT_inline, None, dwarf::DW_INL_inlined);
if (DIE ObjectPointer = ContextCU->createAndAddScopeChildren(Scope, AbsDef))		if (DIE ObjectPointer = ContextCU->createAndAddScopeChildren(Scope, AbsDef))
ContextCU->addDIEEntry(AbsDef, dwarf::DW_AT_object_pointer, ObjectPointer);		ContextCU->addDIEEntry(AbsDef, dwarf::DW_AT_object_pointer, ObjectPointer);
}		}

/// Whether to use the GNU analog for a DWARF5 tag, attribute, or location atom.		bool DwarfCompileUnit::useGNUAnalogForDwarf5Feature() const {
static bool useGNUAnalogForDwarf5Feature(DwarfDebug *DD) {
return DD->getDwarfVersion() == 4 && DD->tuneForGDB();		return DD->getDwarfVersion() == 4 && DD->tuneForGDB();
}		}

dwarf::Tag DwarfCompileUnit::getDwarf5OrGNUTag(dwarf::Tag Tag) const {		dwarf::Tag DwarfCompileUnit::getDwarf5OrGNUTag(dwarf::Tag Tag) const {
if (!useGNUAnalogForDwarf5Feature(DD))		if (!useGNUAnalogForDwarf5Feature())
return Tag;		return Tag;
switch (Tag) {		switch (Tag) {
case dwarf::DW_TAG_call_site:		case dwarf::DW_TAG_call_site:
return dwarf::DW_TAG_GNU_call_site;		return dwarf::DW_TAG_GNU_call_site;
case dwarf::DW_TAG_call_site_parameter:		case dwarf::DW_TAG_call_site_parameter:
return dwarf::DW_TAG_GNU_call_site_parameter;		return dwarf::DW_TAG_GNU_call_site_parameter;
default:		default:
llvm_unreachable("DWARF5 tag with no GNU analog");		llvm_unreachable("DWARF5 tag with no GNU analog");
}		}
}		}

dwarf::Attribute		dwarf::Attribute
DwarfCompileUnit::getDwarf5OrGNUAttr(dwarf::Attribute Attr) const {		DwarfCompileUnit::getDwarf5OrGNUAttr(dwarf::Attribute Attr) const {
if (!useGNUAnalogForDwarf5Feature(DD))		if (!useGNUAnalogForDwarf5Feature())
return Attr;		return Attr;
switch (Attr) {		switch (Attr) {
case dwarf::DW_AT_call_all_calls:		case dwarf::DW_AT_call_all_calls:
return dwarf::DW_AT_GNU_all_call_sites;		return dwarf::DW_AT_GNU_all_call_sites;
case dwarf::DW_AT_call_target:		case dwarf::DW_AT_call_target:
return dwarf::DW_AT_GNU_call_site_target;		return dwarf::DW_AT_GNU_call_site_target;
case dwarf::DW_AT_call_origin:		case dwarf::DW_AT_call_origin:
return dwarf::DW_AT_abstract_origin;		return dwarf::DW_AT_abstract_origin;
case dwarf::DW_AT_call_return_pc:		case dwarf::DW_AT_call_return_pc:
return dwarf::DW_AT_low_pc;		return dwarf::DW_AT_low_pc;
case dwarf::DW_AT_call_value:		case dwarf::DW_AT_call_value:
return dwarf::DW_AT_GNU_call_site_value;		return dwarf::DW_AT_GNU_call_site_value;
case dwarf::DW_AT_call_tail_call:		case dwarf::DW_AT_call_tail_call:
return dwarf::DW_AT_GNU_tail_call;		return dwarf::DW_AT_GNU_tail_call;
default:		default:
llvm_unreachable("DWARF5 attribute with no GNU analog");		llvm_unreachable("DWARF5 attribute with no GNU analog");
}		}
}		}

dwarf::LocationAtom		dwarf::LocationAtom
DwarfCompileUnit::getDwarf5OrGNULocationAtom(dwarf::LocationAtom Loc) const {		DwarfCompileUnit::getDwarf5OrGNULocationAtom(dwarf::LocationAtom Loc) const {
if (!useGNUAnalogForDwarf5Feature(DD))		if (!useGNUAnalogForDwarf5Feature())
return Loc;		return Loc;
switch (Loc) {		switch (Loc) {
case dwarf::DW_OP_entry_value:		case dwarf::DW_OP_entry_value:
return dwarf::DW_OP_GNU_entry_value;		return dwarf::DW_OP_GNU_entry_value;
default:		default:
llvm_unreachable("DWARF5 location atom with no GNU analog");		llvm_unreachable("DWARF5 location atom with no GNU analog");
}		}
}		}

DIE &DwarfCompileUnit::constructCallSiteEntryDIE(DIE &ScopeDIE,		DIE &DwarfCompileUnit::constructCallSiteEntryDIE(DIE &ScopeDIE,
DIE *CalleeDIE,		DIE *CalleeDIE,
bool IsTail,		bool IsTail,
const MCSymbol *PCAddr,		const MCSymbol *PCAddr,
		const MCSymbol *CallAddr,
unsigned CallReg) {		unsigned CallReg) {
// Insert a call site entry DIE within ScopeDIE.		// Insert a call site entry DIE within ScopeDIE.
DIE &CallSiteDIE = createAndAddDIE(getDwarf5OrGNUTag(dwarf::DW_TAG_call_site),		DIE &CallSiteDIE = createAndAddDIE(getDwarf5OrGNUTag(dwarf::DW_TAG_call_site),
ScopeDIE, nullptr);		ScopeDIE, nullptr);

if (CallReg) {		if (CallReg) {
// Indirect call.		// Indirect call.
addAddress(CallSiteDIE, getDwarf5OrGNUAttr(dwarf::DW_AT_call_target),		addAddress(CallSiteDIE, getDwarf5OrGNUAttr(dwarf::DW_AT_call_target),
MachineLocation(CallReg));		MachineLocation(CallReg));
} else {		} else {
assert(CalleeDIE && "No DIE for call site entry origin");		assert(CalleeDIE && "No DIE for call site entry origin");
addDIEEntry(CallSiteDIE, getDwarf5OrGNUAttr(dwarf::DW_AT_call_origin),		addDIEEntry(CallSiteDIE, getDwarf5OrGNUAttr(dwarf::DW_AT_call_origin),
*CalleeDIE);		*CalleeDIE);
}		}

if (IsTail)		if (IsTail) {
// Attach DW_AT_call_tail_call to tail calls for standards compliance.		// Attach DW_AT_call_tail_call to tail calls for standards compliance.
addFlag(CallSiteDIE, getDwarf5OrGNUAttr(dwarf::DW_AT_call_tail_call));		addFlag(CallSiteDIE, getDwarf5OrGNUAttr(dwarf::DW_AT_call_tail_call));

		// Attach the address of the branch instruction to allow the debugger to
		// show where the tail call occurred. This attribute has no GNU analog.
		//
		// GDB works backwards from non-standard usage of DW_AT_low_pc (in DWARF4
		// mode -- equivalently, in DWARF5 mode, DW_AT_call_return_pc) at tail-call
		// site entries to figure out the PC of tail-calling branch instructions.
		// This means it doesn't need the compiler to emit DW_AT_call_pc, so we
		// don't emit it here.
		//
		// There's no need to tie non-GDB debuggers to this non-standardness, as it
		// adds unnecessary complexity to the debugger. For non-GDB debuggers, emit
		// the standard DW_AT_call_pc info.
		if (!useGNUAnalogForDwarf5Feature())
		addLabelAddress(CallSiteDIE, dwarf::DW_AT_call_pc, CallAddr);
		}

// Attach the return PC to allow the debugger to disambiguate call paths		// Attach the return PC to allow the debugger to disambiguate call paths
// from one function to another.		// from one function to another.
//		//
// The return PC is only really needed when the call /isn't/ a tail call, but		// The return PC is only really needed when the call /isn't/ a tail call, but
// for some reason GDB always expects it.		// GDB expects it in DWARF4 mode, even for tail calls (see the comment above
if (!IsTail \|\| DD->tuneForGDB()) {		// the DW_AT_call_pc emission logic for an explanation).
		if (!IsTail \|\| useGNUAnalogForDwarf5Feature()) {
assert(PCAddr && "Missing return PC information for a call");		assert(PCAddr && "Missing return PC information for a call");
addLabelAddress(CallSiteDIE,		addLabelAddress(CallSiteDIE,
getDwarf5OrGNUAttr(dwarf::DW_AT_call_return_pc), PCAddr);		getDwarf5OrGNUAttr(dwarf::DW_AT_call_return_pc), PCAddr);
}		}

return CallSiteDIE;		return CallSiteDIE;
}		}

▲ Show 20 Lines • Show All 347 Lines • Show Last 20 Lines

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp

Show First 20 Lines • Show All 872 Lines • ▼ Show 20 Lines	for (const MachineInstr &MI : MBB.instrs()) {
bool IsTail = TII->isTailCall(MI);		bool IsTail = TII->isTailCall(MI);

// If MI is in a bundle, the label was created after the bundle since		// If MI is in a bundle, the label was created after the bundle since
// EmitFunctionBody iterates over top-level MIs. Get that top-level MI		// EmitFunctionBody iterates over top-level MIs. Get that top-level MI
// to search for that label below.		// to search for that label below.
const MachineInstr *TopLevelCallMI =		const MachineInstr *TopLevelCallMI =
MI.isInsideBundle() ? &*getBundleStart(MI.getIterator()) : &MI;		MI.isInsideBundle() ? &*getBundleStart(MI.getIterator()) : &MI;

// For tail calls, no return PC information is needed.		// For non-tail calls, the return PC is needed to disambiguate paths in
// For regular calls (and tail calls in GDB tuning), the return PC		// the call graph which could lead to some target function. For tail
// is needed to disambiguate paths in the call graph which could lead to		// calls, no return PC information is needed, unless tuning for GDB in
// some target function.		// DWARF4 mode in which case we fake a return PC for compatibility.
const MCSymbol *PCAddr =		const MCSymbol *PCAddr =
(IsTail && !tuneForGDB())		(!IsTail \|\| CU.useGNUAnalogForDwarf5Feature())
		djtodoroUnsubmitted Not Done Reply Inline Actions Looks like the `dwarf::DW_AT_call_return_pc` is enough for GDB to distinguish that even for tail calls. Can we do the same for LLDB and avoid the `call_pc`? djtodoro: Looks like the `dwarf::DW_AT_call_return_pc` is enough for GDB to distinguish that even for…
		vskAuthorUnsubmitted Not Done Reply Inline Actions I don't think that would be valid DWARF. The return pc for a tail call isn't given by the address of the instruction after after the tail-calling branch. Are you sure GDB uses DW_AT_call_return_pc to figure out the address where the tail call was made? vsk: I don't think that would be valid DWARF. The return pc for a tail call isn't given by the…
		djtodoroUnsubmitted Not Done Reply Inline Actions GCC generates the `low_pc` for DWARF 4 (actually GNU extension) and for the DWARF 5 DW_AT_call_return_pc for the call_site TAGs describing the tail calls. That is the address after the jump-like instruction for a tail call. GDB understands that info and prints the artificial frames (the test case from the D76337): gcc -g -O2 tail.c -o gcc-tail-example ... (gdb) r Starting program: gcc-tail-example Breakpoint 1, sink () at tail.c:4 4 x++; (gdb) bt #0 sink () at tail.c:4 #1 0x00000000004004a6 in func3 () at tail.c:9 #2 0x00000000004004b7 in func2 () at tail.c:13 #3 0x00000000004004d6 in func1 () at tail.c:18 #4 0x0000000000400367 in main () at tail.c:22 ... //GDB version 8.3.5// //GCC version 4.9.3// The same happens when using the latest LLVM Trunk (with the D73534 applied), since the DWARF 4 + GDB generates the same DWARF as GCC for the example. clang -g -O2 tail.c -o tail-example ... (gdb) r Starting program: tail-example Breakpoint 1, sink () at tail.c:4 4 x++; (gdb) bt #0 sink () at tail.c:4 #1 0x0000000000401149 in func3 () at tail.c:9 // the DW_AT_low_pc is 0x401149 from the corresponding call_site #2 0x0000000000401156 in func2 () at tail.c:13 #3 0x0000000000401169 in func1 () at tail.c:18 #4 0x0000000000401176 in main () at tail.c:22 ... djtodoro: GCC generates the `low_pc` for DWARF 4 (actually GNU extension) and for the DWARF 5…
		vskAuthorUnsubmitted Done Reply Inline Actions Thanks for digging into this! This seems like a really weird quirk of gcc/gdb. To make this work, the debugger has to know that the DW_AT_call_return_pc for tail calls is bogus, and then find+subtract the correct offset from that address to to find the true call_pc. It seems simpler to just emit the correct call_pc to begin with. In this patch, for the case where we're tuning for gdb in dwarf4 mode, I've omitted DW_AT_call_pc omission though (we'll keep emitting the strange DW_AT_call_return_pc -- I believe there's a test for this). vsk: Thanks for digging into this! This seems like a really weird quirk of gcc/gdb. To make this…
		djtodoroUnsubmitted Not Done Reply Inline Actions DWARF5 + GDB tuning will have both call_pc and return_pc? I think we should ask here like this: `(IsTail && !useGNUAnalogForDwarf5Feature(DD))...` djtodoro: DWARF5 + GDB tuning will have both call_pc and return_pc? I think we should ask here like this…
		djtodoroUnsubmitted Done Reply Inline Actions I still think we can avoid the PCAddr here as `(IsTail && !useGNUAnalogForDwarf5Feature(DD))`, if we want to use the call_pc for gdb + dwarf5 tunning? And also update the comment above. djtodoro: I still think we can avoid the PCAddr here as `(IsTail && !useGNUAnalogForDwarf5Feature(DD))`…
		vskAuthorUnsubmitted Done Reply Inline Actions Ah right, sorry I missed this. vsk: Ah right, sorry I missed this.
? nullptr		? const_cast<MCSymbol *>(getLabelAfterInsn(TopLevelCallMI))
: const_cast<MCSymbol *>(getLabelAfterInsn(TopLevelCallMI));		: nullptr;

		// For tail calls, it's necessary to record the address of the branch
		// instruction so that the debugger can show where the tail call occurred.
		const MCSymbol *CallAddr =
		IsTail ? getLabelBeforeInsn(TopLevelCallMI) : nullptr;

assert((IsTail \|\| PCAddr) && "Call without return PC information");		assert((IsTail \|\| PCAddr) && "Non-tail call without return PC");

LLVM_DEBUG(dbgs() << "CallSiteEntry: " << MF.getName() << " -> "		LLVM_DEBUG(dbgs() << "CallSiteEntry: " << MF.getName() << " -> "
<< (CalleeDecl ? CalleeDecl->getName()		<< (CalleeDecl ? CalleeDecl->getName()
: StringRef(MF.getSubtarget()		: StringRef(MF.getSubtarget()
.getRegisterInfo()		.getRegisterInfo()
->getName(CallReg)))		->getName(CallReg)))
<< (IsTail ? " [IsTail]" : "") << "\n");		<< (IsTail ? " [IsTail]" : "") << "\n");

DIE &CallSiteDIE = CU.constructCallSiteEntryDIE(ScopeDIE, CalleeDIE,		DIE &CallSiteDIE = CU.constructCallSiteEntryDIE(
IsTail, PCAddr, CallReg);		ScopeDIE, CalleeDIE, IsTail, PCAddr, CallAddr, CallReg);

// Optionally emit call-site-param debug info.		// Optionally emit call-site-param debug info.
if (emitDebugEntryValues()) {		if (emitDebugEntryValues()) {
ParamSet Params;		ParamSet Params;
// Try to interpret values of call site parameters.		// Try to interpret values of call site parameters.
collectCallSiteParameters(&MI, Params);		collectCallSiteParameters(&MI, Params);
CU.constructCallSiteParmEntryDIEs(CallSiteDIE, Params);		CU.constructCallSiteParmEntryDIEs(CallSiteDIE, Params);
}		}
▲ Show 20 Lines • Show All 872 Lines • ▼ Show 20 Lines	for (const DINode *DN : SP->getRetainedNodes()) {

if (Scope)		if (Scope)
createConcreteEntity(TheCU, *Scope, DN, nullptr);		createConcreteEntity(TheCU, *Scope, DN, nullptr);
}		}
}		}

// Process beginning of an instruction.		// Process beginning of an instruction.
void DwarfDebug::beginInstruction(const MachineInstr *MI) {		void DwarfDebug::beginInstruction(const MachineInstr *MI) {
		const MachineFunction &MF = *MI->getMF();
		const auto *SP = MF.getFunction().getSubprogram();
		bool NoDebug =
		!SP \|\| SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug;

		// When describing calls, we need a label for the call instruction.
		// TODO: Add support for targets with delay slots.
		if (!NoDebug && SP->areAllCallsDescribed() &&
		djtodoroUnsubmitted Done Reply Inline Actions Should we use the `isCandidateForCallSiteEntry()` instead? djtodoro: Should we use the `isCandidateForCallSiteEntry()` instead?
		vskAuthorUnsubmitted Done Reply Inline Actions Yes, thanks! vsk: Yes, thanks!
		MI->isCandidateForCallSiteEntry(MachineInstr::AnyInBundle) &&
		!MI->hasDelaySlot()) {
		const TargetInstrInfo *TII = MF.getSubtarget().getInstrInfo();
		bool IsTail = TII->isTailCall(*MI);
		// For tail calls, we need the address of the branch instruction for
		// DW_AT_call_pc.
		if (IsTail)
		requestLabelBeforeInsn(MI);
		// For non-tail calls, we need the return address for the call for
		// DW_AT_call_return_pc. Under GDB tuning, this information is needed for
		// tail calls as well.
		requestLabelAfterInsn(MI);
		}

DebugHandlerBase::beginInstruction(MI);		DebugHandlerBase::beginInstruction(MI);
assert(CurMI);		assert(CurMI);

const auto *SP = MI->getMF()->getFunction().getSubprogram();		if (NoDebug)
if (!SP \|\| SP->getUnit()->getEmissionKind() == DICompileUnit::NoDebug)
return;		return;

// Check if source location changes, but ignore DBG_VALUE and CFI locations.		// Check if source location changes, but ignore DBG_VALUE and CFI locations.
// If the instruction is part of the function frame setup code, do not emit		// If the instruction is part of the function frame setup code, do not emit
// any line record, as there is no correspondence with any user code.		// any line record, as there is no correspondence with any user code.
if (MI->isMetaInstruction() \|\| MI->getFlag(MachineInstr::FrameSetup))		if (MI->isMetaInstruction() \|\| MI->getFlag(MachineInstr::FrameSetup))
return;		return;
const DebugLoc &DL = MI->getDebugLoc();		const DebugLoc &DL = MI->getDebugLoc();
// When we emit a line-0 record, we don't update PrevInstLoc; so look at		// When we emit a line-0 record, we don't update PrevInstLoc; so look at
// the last line number actually emitted, to see if it was line 0.		// the last line number actually emitted, to see if it was line 0.
unsigned LastAsmLine =		unsigned LastAsmLine =
Asm->OutStreamer->getContext().getCurrentDwarfLoc().getLine();		Asm->OutStreamer->getContext().getCurrentDwarfLoc().getLine();

// Request a label after the call in order to emit AT_return_pc information
// in call site entries. TODO: Add support for targets with delay slots.
if (SP->areAllCallsDescribed() && MI->isCall() && !MI->hasDelaySlot())
requestLabelAfterInsn(MI);

if (DL == PrevInstLoc) {		if (DL == PrevInstLoc) {
// If we have an ongoing unspecified location, nothing to do here.		// If we have an ongoing unspecified location, nothing to do here.
if (!DL)		if (!DL)
return;		return;
// We have an explicit location, same as the previous location.		// We have an explicit location, same as the previous location.
// But we might be coming back to it after a line 0 record.		// But we might be coming back to it after a line 0 record.
if (LastAsmLine == 0 && DL.getLine() != 0) {		if (LastAsmLine == 0 && DL.getLine() != 0) {
// Reinstate the source location but not marked as a statement.		// Reinstate the source location but not marked as a statement.
▲ Show 20 Lines • Show All 1,339 Lines • Show Last 20 Lines

llvm/lib/DWARFLinker/DWARFLinker.cpp

Show First 20 Lines • Show All 1,050 Lines • ▼ Show 20 Lines	if (Die.getTag() == dwarf::DW_TAG_compile_unit) {
// it. Otherwise (when no relocations where applied) just use the		// it. Otherwise (when no relocations where applied) just use the
// one we just decoded.		// one we just decoded.
Addr = (Info.OrigHighPc ? Info.OrigHighPc : Addr) + Info.PCOffset;		Addr = (Info.OrigHighPc ? Info.OrigHighPc : Addr) + Info.PCOffset;
} else if (AttrSpec.Attr == dwarf::DW_AT_call_return_pc) {		} else if (AttrSpec.Attr == dwarf::DW_AT_call_return_pc) {
// Relocate a return PC address within a call site entry.		// Relocate a return PC address within a call site entry.
if (Die.getTag() == dwarf::DW_TAG_call_site)		if (Die.getTag() == dwarf::DW_TAG_call_site)
Addr = (Info.OrigCallReturnPc ? Info.OrigCallReturnPc : Addr) +		Addr = (Info.OrigCallReturnPc ? Info.OrigCallReturnPc : Addr) +
Info.PCOffset;		Info.PCOffset;
		} else if (AttrSpec.Attr == dwarf::DW_AT_call_pc) {
		// Relocate the address of a branch instruction within a call site entry.
		if (Die.getTag() == dwarf::DW_TAG_call_site)
		Addr = (Info.OrigCallPc ? Info.OrigCallPc : Addr) + Info.PCOffset;
}		}

Die.addValue(DIEAlloc, static_cast<dwarf::Attribute>(AttrSpec.Attr),		Die.addValue(DIEAlloc, static_cast<dwarf::Attribute>(AttrSpec.Attr),
static_cast<dwarf::Form>(AttrSpec.Form), DIEInteger(Addr));		static_cast<dwarf::Form>(AttrSpec.Form), DIEInteger(Addr));
return Unit.getOrigUnit().getAddressByteSize();		return Unit.getOrigUnit().getAddressByteSize();
}		}

unsigned DWARFLinker::DIECloner::cloneScalarAttribute(		unsigned DWARFLinker::DIECloner::cloneScalarAttribute(
▲ Show 20 Lines • Show All 1,430 Lines • Show Last 20 Lines

llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir

	# Test the call site encoding in DWARF5 vs GNU extensions.			# Test the call site encoding in DWARF5 vs GNU extensions.
	#			#
				# === DWARF4, tune for gdb ===
	# RUN: llc -emit-call-site-info -dwarf-version 4 -debugger-tune=gdb -filetype=obj \			# RUN: llc -emit-call-site-info -dwarf-version 4 -debugger-tune=gdb -filetype=obj \
	# RUN: -mtriple=x86_64-unknown-unknown -start-after=machineverifier -o - %s \			# RUN: -mtriple=x86_64-unknown-unknown -start-after=machineverifier -o - %s \
	# RUN: \| llvm-dwarfdump - \| FileCheck %s -check-prefixes=CHECK-GNU			# RUN: \| llvm-dwarfdump - \| FileCheck %s -check-prefixes=CHECK-GNU -implicit-check-not=DW_AT_call
	#			#
	# RUN: llc -emit-call-site-info -dwarf-version 5 -debugger-tune=lldb -filetype=obj \			# === DWARF5, tune for gdb ===
				# RUN: llc -dwarf-version 5 -debugger-tune=gdb -emit-call-site-info -filetype=obj \
	# RUN: -mtriple=x86_64-unknown-unknown -start-after=machineverifier -o - %s \			# RUN: -mtriple=x86_64-unknown-unknown -start-after=machineverifier -o - %s \
	# RUN: \| llvm-dwarfdump - \| FileCheck %s -check-prefixes=CHECK-DWARF5			# RUN: \| llvm-dwarfdump - \| FileCheck %s -check-prefixes=CHECK-DWARF5 -implicit-check-not=DW_AT_call
	#			#
	# RUN: llc -emit-call-site-info -dwarf-version 5 -filetype=obj \			# === DWARF4, tune for lldb ===
				# RUN: llc -dwarf-version 4 -debugger-tune=lldb -emit-call-site-info -filetype=obj \
	# RUN: -mtriple=x86_64-unknown-unknown -start-after=machineverifier -o - %s \			# RUN: -mtriple=x86_64-unknown-unknown -start-after=machineverifier -o - %s \
	# RUN: \| llvm-dwarfdump - \| FileCheck %s -check-prefixes=CHECK-DWARF5			# RUN: \| llvm-dwarfdump - \| FileCheck %s -check-prefixes=CHECK-DWARF5 -implicit-check-not=DW_AT_call
				#
				# === DWARF5, tune for lldb ===
				# RUN: llc -dwarf-version 5 -debugger-tune=lldb -emit-call-site-info -filetype=obj \
				# RUN: -mtriple=x86_64-unknown-unknown -start-after=machineverifier -o - %s \
				# RUN: \| llvm-dwarfdump - \| FileCheck %s -check-prefixes=CHECK-DWARF5 -implicit-check-not=DW_AT_call
	#			#
	# RUN: llc -emit-call-site-info -dwarf-version 5 -filetype=obj -debugger-tune=sce \			# RUN: llc -emit-call-site-info -dwarf-version 5 -filetype=obj -debugger-tune=sce \
	# RUN: -emit-debug-entry-values -debug-entry-values -mtriple=x86_64-unknown-unknown \			# RUN: -emit-debug-entry-values -debug-entry-values -mtriple=x86_64-unknown-unknown \
	# RUN: -start-after=machineverifier -o - %s \| llvm-dwarfdump - \| FileCheck %s -check-prefixes=CHECK-DWARF5			# RUN: -start-after=machineverifier -o - %s \| llvm-dwarfdump - \| FileCheck %s -check-prefixes=CHECK-DWARF5
	#			#
	# This is based on the following reproducer:			# This is based on the following reproducer:
	#			#
	# extern void fn();			# extern void fn();
				djtodoroUnsubmitted Not Done Reply Inline Actions GCC produces this: <2><9a>: Abbrev Number: 9 (DW_TAG_GNU_call_site) <9b> DW_AT_low_pc : 0x23 <a3> DW_AT_GNU_tail_call: 1 <a3> DW_AT_abstract_origin: <0xd4> djtodoro: GCC produces this: <2><9a>: Abbrev Number: 9 (DW_TAG_GNU_call_site) <9b>…
				vskAuthorUnsubmitted Not Done Reply Inline Actions Is the DW_AT_low_pc the address of the tail-calling branch instruction, or is it the address of the next instruction? vsk: Is the DW_AT_low_pc the address of the tail-calling branch instruction, or is it the address of…
				djtodoroUnsubmitted Not Done Reply Inline Actions As I mentioned above, it is the address after the tail-call instruction. djtodoro: As I mentioned above, it is the address after the tail-call instruction.
	# extern void fn2(int x);			# extern void fn2(int x);
	# extern int fn3();			# extern int fn3();
	#			#
	# int fn1(int (*fn4) ()) {			# int fn1(int (*fn4) ()) {
	# fn();			# fn();
	# fn2(5);			# fn2(5);
	#			#
	# int x = (*fn4)();			# int x = (*fn4)();
	Show All 14 Lines
	# CHECK-GNU-NEXT: DW_AT_location			# CHECK-GNU-NEXT: DW_AT_location
	# CHECK-GNU-NEXT: DW_AT_GNU_call_site_value			# CHECK-GNU-NEXT: DW_AT_GNU_call_site_value
	# CHECK-GNU: DW_TAG_GNU_call_site			# CHECK-GNU: DW_TAG_GNU_call_site
	# CHECK-GNU-NEXT: DW_AT_GNU_call_site_target			# CHECK-GNU-NEXT: DW_AT_GNU_call_site_target
	# CHECK-GNU-NEXT: DW_AT_low_pc			# CHECK-GNU-NEXT: DW_AT_low_pc
	# CHECK-GNU: DW_TAG_GNU_call_site			# CHECK-GNU: DW_TAG_GNU_call_site
	# CHECK-GNU-NEXT: DW_AT_abstract_origin			# CHECK-GNU-NEXT: DW_AT_abstract_origin
	# CHECK-GNU-NEXT: DW_AT_GNU_tail_call			# CHECK-GNU-NEXT: DW_AT_GNU_tail_call
				# CHECK-GNU-NEXT: DW_AT_low_pc
	#			#
	#			#
	# Check DWARF 5:			# Check DWARF 5:
	#			#
	# CHECK-DWARF5: DW_TAG_subprogram			# CHECK-DWARF5: DW_TAG_subprogram
	# CHECK-DWARF5: DW_AT_call_all_calls (true)			# CHECK-DWARF5: DW_AT_call_all_calls (true)
	# CHECK-DWARF5: DW_TAG_call_site			# CHECK-DWARF5: DW_TAG_call_site
	# CHECK-DWARF5-NEXT: DW_AT_call_origin			# CHECK-DWARF5-NEXT: DW_AT_call_origin
	# CHECK-DWARF5-NEXT: DW_AT_call_return_pc			# CHECK-DWARF5-NEXT: DW_AT_call_return_pc
				# CHECK-DWARF5: DW_TAG_call_site
				# CHECK-DWARF5-NEXT: DW_AT_call_origin
				# CHECK-DWARF5-NEXT: DW_AT_call_return_pc
	# CHECK-DWARF5: DW_TAG_call_site_parameter			# CHECK-DWARF5: DW_TAG_call_site_parameter
	# CHECK-DWARF5-NEXT: DW_AT_location			# CHECK-DWARF5-NEXT: DW_AT_location
	# CHECK-DWARF5-NEXT: DW_AT_call_value			# CHECK-DWARF5-NEXT: DW_AT_call_value
	# CHECK-DWARF5: DW_TAG_call_site			# CHECK-DWARF5: DW_TAG_call_site
	# CHECK-DWARF5-NEXT: DW_AT_call_target			# CHECK-DWARF5-NEXT: DW_AT_call_target
	# CHECK-DWARF5-NEXT: DW_AT_call_return_pc			# CHECK-DWARF5-NEXT: DW_AT_call_return_pc
	# CHECK-DWARF5: DW_TAG_call_site			# CHECK-DWARF5: DW_TAG_call_site
	# CHECK-DWARF5-NEXT: DW_AT_call_origin			# CHECK-DWARF5-NEXT: DW_AT_call_origin
	# CHECK-DWARF5-NEXT: DW_AT_call_tail_call			# CHECK-DWARF5-NEXT: DW_AT_call_tail_call
				# CHECK-DWARF5-NEXT: DW_AT_call_pc
	#			#
	--- \|			--- \|
	; ModuleID = 'call-site-attrs.c'			; ModuleID = 'call-site-attrs.c'
	source_filename = "call-site-attrs.c"			source_filename = "call-site-attrs.c"
	target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"			target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-f80:128-n8:16:32:64-S128"
	target triple = "x86_64-unknown-linux-gnu"			target triple = "x86_64-unknown-linux-gnu"
	; Function Attrs: nounwind uwtable			; Function Attrs: nounwind uwtable
	define dso_local i32 @fn1(i32 (...)* nocapture %fn4) local_unnamed_addr !dbg !18 {			define dso_local i32 @fn1(i32 (...)* nocapture %fn4) local_unnamed_addr !dbg !18 {
	▲ Show 20 Lines • Show All 111 Lines • Show Last 20 Lines

llvm/test/DebugInfo/X86/dwarf-callsite-related-attrs.ll

	; $ clang++ -S -emit-llvm -o - -gdwarf-5 -o - -O1 tail2.cc			; $ clang++ -S -emit-llvm -o - -gdwarf-5 -o - -O1 tail2.cc
	; volatile int sink;			; volatile int sink;
	; void __attribute__((noinline)) bat() { sink++; }			; void __attribute__((noinline)) bat() { sink++; }
	; void __attribute__((noinline)) bar() { sink++; }			; void __attribute__((noinline)) bar() { sink++; }
	; void __attribute__((noinline)) foo() {			; void __attribute__((noinline)) foo() {
	; bar(); bat();			; bar(); bat();
	; bar(); bat();			; bar(); bat();
	; }			; }
	; int __attribute__((disable_tail_calls)) main() { foo(); }			; int __attribute__((disable_tail_calls)) main() { foo(); }

	; On Windows, we don't handle the relocations needed for AT_return_pc properly			; On Windows, we don't handle the relocations needed for AT_return_pc properly
	; and fail with "failed to compute relocation: IMAGE_REL_AMD64_ADDR32".			; and fail with "failed to compute relocation: IMAGE_REL_AMD64_ADDR32".
	; UNSUPPORTED: cygwin,windows-gnu,windows-msvc			; UNSUPPORTED: cygwin,windows-gnu,windows-msvc

	; RUN: %llc_dwarf -mtriple=x86_64-- < %s -o - \| FileCheck %s -check-prefix=ASM			; RUN: %llc_dwarf -mtriple=x86_64-- < %s -o - \| FileCheck %s -check-prefix=ASM
	; RUN: %llc_dwarf -debugger-tune=lldb -mtriple=x86_64-- < %s -filetype=obj -o %t.o			; RUN: %llc_dwarf -debugger-tune=lldb -mtriple=x86_64-- < %s -filetype=obj -o %t.o
	; RUN: llvm-dwarfdump %t.o -o - \| FileCheck %s -check-prefix=OBJ -implicit-check-not=DW_TAG_call_site			; RUN: llvm-dwarfdump %t.o -o - \| FileCheck %s -check-prefix=OBJ -implicit-check-not=DW_TAG_call -implicit-check-not=DW_AT_call
	; RUN: llvm-dwarfdump -verify %t.o 2>&1 \| FileCheck %s -check-prefix=VERIFY			; RUN: llvm-dwarfdump -verify %t.o 2>&1 \| FileCheck %s -check-prefix=VERIFY
	; RUN: llvm-dwarfdump -statistics %t.o \| FileCheck %s -check-prefix=STATS			; RUN: llvm-dwarfdump -statistics %t.o \| FileCheck %s -check-prefix=STATS
	; RUN: llvm-as < %s \| llvm-dis \| llvm-as \| llvm-dis -o /dev/null			; RUN: llvm-as < %s \| llvm-dis \| llvm-as \| llvm-dis -o /dev/null

	; VERIFY: No errors.			; VERIFY: No errors.
	; STATS: "call site DIEs":6			; STATS: "call site DIEs":6

	@sink = global i32 0, align 4, !dbg !0			@sink = global i32 0, align 4, !dbg !0
	▲ Show 20 Lines • Show All 44 Lines • ▼ Show 20 Lines
	; OBJ: DW_AT_call_origin ([[bat_sp]])			; OBJ: DW_AT_call_origin ([[bat_sp]])
	; OBJ: DW_AT_call_return_pc			; OBJ: DW_AT_call_return_pc
	; OBJ: DW_TAG_call_site			; OBJ: DW_TAG_call_site
	; OBJ: DW_AT_call_origin ([[bar_sp]])			; OBJ: DW_AT_call_origin ([[bar_sp]])
	; OBJ: DW_AT_call_return_pc			; OBJ: DW_AT_call_return_pc
	; OBJ: DW_TAG_call_site			; OBJ: DW_TAG_call_site
	; OBJ: DW_AT_call_origin ([[bat_sp]])			; OBJ: DW_AT_call_origin ([[bat_sp]])
	; OBJ: DW_AT_call_tail_call			; OBJ: DW_AT_call_tail_call
				; OBJ: DW_AT_call_pc
	define void @_Z3foov() !dbg !25 {			define void @_Z3foov() !dbg !25 {
	entry:			entry:
	tail call void @__has_no_subprogram()			tail call void @__has_no_subprogram()
	tail call void @_Z3barv(), !dbg !26			tail call void @_Z3barv(), !dbg !26
	tail call void @_Z3batv(), !dbg !27			tail call void @_Z3batv(), !dbg !27
	tail call void @_Z3barv(), !dbg !26			tail call void @_Z3barv(), !dbg !26
	tail call void @_Z3batv(), !dbg !27			tail call void @_Z3batv(), !dbg !27
	ret void, !dbg !28			ret void, !dbg !28
	▲ Show 20 Lines • Show All 63 Lines • Show Last 20 Lines

llvm/test/tools/dsymutil/X86/Inputs/tail-call.cpp

This file was added.

				/*
				* This file is used to test dsymutil support for call site entries with tail
				* calls (DW_AT_call_pc).
				*
				* Instructions for regenerating binaries (on Darwin/x86_64):
				*
				* 1. Copy the source to a top-level directory to work around having absolute
				* paths in the symtab's OSO entries.
				*
				* mkdir -p /Inputs/ && cp tail-call.c /Inputs && cd /Inputs
				*
				* 2. Compile with call site info enabled. -O2 is used to get tail call
				* promotion.
				*
				* clang -g -O2 tail-call.c -c -o tail-call.macho.x86_64.o
				* clang tail-call.macho.x86_64.o -o tail-call.macho.x86_64
				*
				* 3. Copy the binaries back into the repo's Inputs directory. You'll need
				* -oso-prepend-path=%p to link.
				*/

				volatile int x;

				__attribute__((disable_tail_calls, noinline)) void func2() { x++; }

				__attribute__((noinline)) void func1() { func2(); /* tail */ }

				__attribute__((disable_tail_calls)) int main() { func1(); /* regular */ }

llvm/test/tools/dsymutil/X86/Inputs/tail-call.macho.x86_64

This binary file was added.

Property	Old Value	New Value
File Mode	null	100755

llvm/test/tools/dsymutil/X86/Inputs/tail-call.macho.x86_64.o

This binary file was added.

llvm/test/tools/dsymutil/X86/tail-call-linking.test

This file was added.

				RUN: dsymutil -oso-prepend-path=%p %p/Inputs/tail-call.macho.x86_64 -o %t.dSYM
				RUN: llvm-dwarfdump %t.dSYM \| FileCheck %s -implicit-check-not=DW_AT_call_pc

				CHECK: DW_AT_call_pc (0x0000000100000f95)

This is an archive of the discontinued LLVM Phabricator instance.

[DWARF] Emit DW_AT_call_pc for tail callsClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 252394

llvm/include/llvm/DWARFLinker/DWARFLinker.h

llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.h

llvm/lib/CodeGen/AsmPrinter/DwarfCompileUnit.cpp

llvm/lib/CodeGen/AsmPrinter/DwarfDebug.cpp

llvm/lib/DWARFLinker/DWARFLinker.cpp

llvm/test/DebugInfo/MIR/X86/call-site-gnu-vs-dwarf5-attrs.mir

llvm/test/DebugInfo/X86/dwarf-callsite-related-attrs.ll

llvm/test/tools/dsymutil/X86/Inputs/tail-call.cpp

llvm/test/tools/dsymutil/X86/Inputs/tail-call.macho.x86_64

llvm/test/tools/dsymutil/X86/Inputs/tail-call.macho.x86_64.o

llvm/test/tools/dsymutil/X86/tail-call-linking.test

[DWARF] Emit DW_AT_call_pc for tail calls
ClosedPublic