This is an archive of the discontinued LLVM Phabricator instance.

Avoid emitting redundant or unusable directories in DIFile metadata entries
ClosedPublic

Authored by aprantl on Nov 29 2018, 3:28 PM.

Details

Summary

As discussed on llvm-dev today, Clang currently emits redundant directories in DIFile entries, such as

.file 1 "/Volumes/Data/llvm" "/Volumes/Data/llvm/tools/clang/test/CodeGen/debug-info-abspath.c"

This patch looks at any common prefix between the compilation directory and the (absolute) file path and strips the redundant part. More importantly it leaves the compilation directory empty if the two paths have no common prefix.

After this patch the above entry is (assuming a compilation dir of "/Volumes/Data/llvm/_build"):

.file 1 "/Volumes/Data/llvm" "tools/clang/test/CodeGen/debug-info-abspath.c"

When building the FileCheck binary with debug info, this patch makes the build artifacts ~1kb smaller.

Diff Detail

Event Timeline

aprantl created this revision.Nov 29 2018, 3:28 PM
dblaikie accepted this revision.Nov 29 2018, 3:33 PM

Seems OK to me

This revision is now accepted and ready to land.Nov 29 2018, 3:33 PM
aprantl updated this revision to Diff 175982.Nov 29 2018, 3:59 PM

Bugfix for LexicalBlockFiles + testcase updates.

aprantl updated this revision to Diff 176148.Nov 30 2018, 9:31 AM
aprantl added a reviewer: davide.

Turns out that my patch had an unintended interaction with the backend diagnostics engine. This is an updated version of the patch that also updates the backend diagnostics engine to still emit the same diagnostics as before even with the more efficient DIFile encoding.

The backend diagnostics call back into the frontend to emit nicely formatted diagnostics, but my initial CFE patch broke this mechanism for files with an absolute path. This is now fixed by first looking up the relative path in the FileManager and then falling back to the absolute path, which we construct by joining the DIFile's directory and filename.

aprantl updated this revision to Diff 176152.Nov 30 2018, 9:33 AM

Remove debugging code accidentally left in the patch.

Adding a few more reviewers since I'm touching the backend diagnostics. The backend change is supposed to be NFC, but it never hurts to have more feedback.

When building the FileCheck binary with debug info, this patch makes the build artifacts ~1kb smaller.

Nice!

lib/IR/DiagnosticInfo.cpp
39

Do we use a case-sensitive sort of include files? I thought it was insensitive. Of course the coding standard doesn't say. :-P

aprantl marked an inline comment as done.Nov 30 2018, 10:16 AM
aprantl added inline comments.
lib/IR/DiagnosticInfo.cpp
39

I just ran clang-format without thinking about this...

probinson added inline comments.Nov 30 2018, 10:58 AM
lib/IR/DiagnosticInfo.cpp
39

"Don't argue with clang-format." Okay.

davide accepted this revision.Dec 3 2018, 9:15 AM

LGTM, sorry. for the delay.

This revision was automatically updated to reflect the committed changes.
aprantl updated this revision to Diff 176677.Dec 4 2018, 10:58 AM
aprantl added a reviewer: ilya-biryukov.

Ilya, this updated revision should restore the original GCOV behavior both for absolute and relative paths.

It looks like the integrate went smoothly with the new change, thanks for fixing it!

Ah, sorry, I incorrectly checked that the old revision landed without problems. Have no data on whether the new revision breaks anything, will have to wait for the input from the US timezone buildcop

Hello @aprantl,

This change broke the test suite when building in /tmp (tmpfs) on Linux:

FAIL: Clang :: CodeGen/debug-info-abspath.c (5676 of 44360)

  • TEST 'Clang :: CodeGen/debug-info-abspath.c' FAILED ****

Script:

: 'RUN: at line 1'; /tmp/_t/bin/clang -cc1 -internal-isystem /tmp/_t/lib/clang/8.0.0/include -nostdsysteminc -debug-info-kind=limited -triple x86_64-unknown-linux-gnu /home/dave/s/lc/clang/test/CodeGen/debug-info-abspath.c -emit-llvm -o - | /tmp/_t/bin/FileCheck /home/dave/s/lc/clang/test/CodeGen/debug-info-abspath.c
: 'RUN: at line 4'; cp /home/dave/s/lc/clang/test/CodeGen/debug-info-abspath.c /tmp/_t/tools/clang/test/CodeGen/Output/debug-info-abspath.c.tmp.c

: 'RUN: at line 5'; /tmp/_t/bin/clang -cc1 -internal-isystem /tmp/_t/lib/clang/8.0.0/include -nostdsysteminc -debug-info-kind=limited -triple x86_64-unknown-linux-gnu /tmp/_t/tools/clang/test/CodeGen/Output/debug-info-abspath.c.tmp.c -emit-llvm -o - | /tmp/_t/bin/FileCheck /home/dave/s/lc/clang/test/CodeGen/debug-info-abspath.c --check-prefix=INTREE

Exit Code: 1

Command Output (stderr):

/home/dave/s/lc/clang/test/CodeGen/debug-info-abspath.c:13:16: error: CHECK-SAME: expected string not found in input
// CHECK-SAME: directory: "{{.+}}")

^

<stdin>:25:81: note: scanning from here
!7 = !DIFile(filename: "/home/dave/s/lc/clang/test/CodeGen/debug-info-abspath.c", directory: "")

^

<stdin>:25:83: note: possible intended match here
!7 = !DIFile(filename: "/home/dave/s/lc/clang/test/CodeGen/debug-info-abspath.c", directory: "")

  ^
nhaehnle removed a subscriber: nhaehnle.Dec 7 2018, 8:11 AM

Thanks, should be fixed in r348610!

Hi,
A late question about this change. I notice that this change sometimes gives me additional DIFiles in the clang output compared to before.
E.g. if I have a file

/tmp/bar/foo.c

containing just

void foo() {
}

and I stand in /tmp/ and do

clang -emit-llvm -S -g /tmp/bar/foo.c -o -

then I get two DIFiles in the output:

!0 = distinct !DICompileUnit(language: DW_LANG_C99, file: !1, producer: "clang version 8.0.0 (trunk 348738) (llvm/trunk 348737)", isOptimized: false, runtimeVersion: 0, emissionKind: FullDebug, enums: !2, nameTableKind: None)
!1 = !DIFile(filename: "/tmp/bar/foo.c", directory: "/tmp")
[...]
!8 = !DIFile(filename: "bar/foo.c", directory: "/tmp")

where !1 is only used in !DICompileUnit and !8 is used everywhere else. Before this change the only DIFile I got was

!1 = !DIFile(filename: "/tmp/bar/foo.c", directory: "/tmp")

so in the clang output we actually got more tuff with the change than before.

Also, for my out-of-tree target the two DIFiles later caused confusion for gdb since there then was a mismatch between the compilation unit name
in the debug line section and in the debug info section. Due to that mismatch gdb dropped the prologue_end so in cases where foo had a more interesting body
I would stop at the wrong place if I set a breakpoint with
break foo

I haven't seen the problem with a breakpoint ending up at the wrong place when compiling for X86 though and I'm not sure
if it's because I haven't tried hard enough or if it's something special with my backend.

Anyway, is the above as expected or should the filename in !1 be shortened as well?