This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Duplicate file names in debug info
Needs ReviewPublic

Authored by kamleshbhalui on Dec 13 2019, 9:27 PM.

Details

Summary

strip/remove the dot slash before creating files for debug info.
This fixes https://bugs.llvm.org/show_bug.cgi?id=44170

I am unable to add a test for this.
since it requires to pass command like this.
$clang ./t.c -g -emit-llvm -S

Diff Detail

Event Timeline

kamleshbhalui created this revision.Dec 13 2019, 9:27 PM
Herald added a project: Restricted Project. · View Herald TranscriptDec 13 2019, 9:27 PM
dstenb added a subscriber: dstenb.Dec 14 2019, 3:41 PM

It should be possible to create a test for this using something like:

RUN: rm -rf %t && mkdir %t
RUN: cp %s %t/filename.c
RUN: cd %t
RUN: clang ./filename.c [...]

Do we have a similar problem if the filespec has an embedded ./ or ../ in it? I'm thinking some broader canonicalization ought to be done here.
$ clang ./dir1/dir2/../dir3/file.c
should resolve to dir1/dir3/file.c shouldn't it?

Do we have a similar problem if the filespec has an embedded ./ or ../ in it?

problems occur only when filespec starts with ./ otherwise it's fine.

probinson added a comment.EditedDec 16 2019, 9:16 AM

Do we have a similar problem if the filespec has an embedded ./ or ../ in it?

problems occur only when filespec starts with ./ otherwise it's fine.

So do other kinds of paths get canonicalized elsewhere? I'm thinking if there's already a place that undoes embedded .. then it should handle leading . as well.

Is a leading .. okay or a problem?

Do we have a similar problem if the filespec has an embedded ./ or ../ in it?

problems occur only when filespec starts with ./ otherwise it's fine.

So do other kinds of paths get canonicalized elsewhere? I'm thinking if there's already a place that undoes embedded .. then it should handle leading . as well.

other canonicalization happens here
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L541
and
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L441

but that does not undo any embedded . and .. anywhere,except leading ./

i.e. when we pass like
$clang .././p1.c -S -emit-llvm -g

IRGen gives single file entry like this
!1 = !DIFile(filename: ".././p1.c"

As you can see path is not canonicalized but it atleast does not have duplicate file entry.

Is a leading .. okay or a problem?

yes It's ok in the sense ,it does not create duplicate file entry in debug info.

Are we sure we want to canonicalize *before* applying -fdebug-prefix-map in remapDIPath()? Honest question, the answer could be yes :-)

clang/lib/CodeGen/CGDebugInfo.cpp
412

I believe that this check is redundant and also part of remove_leading_dotslash?

Do we have a similar problem if the filespec has an embedded ./ or ../ in it?

problems occur only when filespec starts with ./ otherwise it's fine.

So do other kinds of paths get canonicalized elsewhere? I'm thinking if there's already a place that undoes embedded .. then it should handle leading . as well.

other canonicalization happens here
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L541
and
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L441

but that does not undo any embedded . and .. anywhere,except leading ./

i.e. when we pass like
$clang .././p1.c -S -emit-llvm -g

IRGen gives single file entry like this
!1 = !DIFile(filename: ".././p1.c"

As you can see path is not canonicalized but it atleast does not have duplicate file entry.

Even if that .c file is referred to by a different path - what about a #include of that .c file (with some #ifdefs to avoid infinite include recursion) by the absolute path, for instance? does that still not end up with duplicate files with two different paths to the same file?

Is a leading .. okay or a problem?

yes It's ok in the sense ,it does not create duplicate file entry in debug info.

Do we have a similar problem if the filespec has an embedded ./ or ../ in it?

problems occur only when filespec starts with ./ otherwise it's fine.

So do other kinds of paths get canonicalized elsewhere? I'm thinking if there's already a place that undoes embedded .. then it should handle leading . as well.

other canonicalization happens here
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L541
and
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L441

but that does not undo any embedded . and .. anywhere,except leading ./

i.e. when we pass like
$clang .././p1.c -S -emit-llvm -g

IRGen gives single file entry like this
!1 = !DIFile(filename: ".././p1.c"

As you can see path is not canonicalized but it atleast does not have duplicate file entry.

Even if that .c file is referred to by a different path - what about a #include of that .c file (with some #ifdefs to avoid infinite include recursion) by the absolute path, for instance? does that still not end up with duplicate files with two different paths to the same file?

No, that does not happen. Because in that case we create file entry for first time with absolute path and later we always get the same file with getOrCreateFile.

The duplicate File entry only happens with the file passed by driver -main-file-name p1.cc which used in creating compilation unit.
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L518

and while creating debuginfo for variable / function it refers to file by quering getOrCreateFile with relative path specified on commmand line and it sees that there is no such entry and ends up creating new file entry with this path hence duplication.
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L406

Is a leading .. okay or a problem?

yes It's ok in the sense ,it does not create duplicate file entry in debug info.

Are we sure we want to canonicalize *before* applying -fdebug-prefix-map in remapDIPath()? Honest question, the answer could be yes :-)

it canonicalizes before apply -fdebug-prefix-map in remapDIPath() only when creating compilation unit with filename passed by driver option -main-file-name
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L615

Other than that it first apply -fdebug-prefix-map in remapDIPath() then try to canonicalize.
https://github.com/llvm-mirror/clang/blob/master/lib/CodeGen/CGDebugInfo.cpp#L441

Do we have a similar problem if the filespec has an embedded ./ or ../ in it? I'm thinking some broader canonicalization ought to be done here.
$ clang ./dir1/dir2/../dir3/file.c
should resolve to dir1/dir3/file.c shouldn't it?

I would be very careful about aggressive canonicalization. Once you throw symlinks into the mix, there's very little things you can safely do. If dir2 is a symlink then dir2/.. can point pretty much anywhere. And if you use something like realpath there's no telling whether the final result actually be the thing you actually want to put into the debug info (your whole source tree could be a "symlink farm" with individual files pointing to random unpredictable locations).

It seems to me that the underlying problem here is that there are different kinds of canonicalization applied to the "main" and other files, which is something that the comment in CDDebugInfo::CreateCompileUnit seems to acknowledge. However, I don't know anything about this code to say how/if it is possible to fix that...