This is an archive of the discontinued LLVM Phabricator instance.

CodeGen: Don't try to canonicalize Unix-style paths in CodeView debug info.
ClosedPublic

Authored by pcc on Apr 9 2018, 7:02 PM.

Details

Summary

Most importantly, we should not replace slashes with backslashes
because that would invalidate the path.

Event Timeline

pcc created this revision.Apr 9 2018, 7:02 PM
zturner added inline comments.Apr 9 2018, 7:25 PM
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
117–125

Shouldn't we instead just use something like LLVM_ON_WIN32 etc?

pcc added inline comments.Apr 9 2018, 7:40 PM
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
117–125

I don't think so. Imagine that you have something like a ThinLTO distributed build where bitcode files are copied to a remote machine where code generation happens. In that case the code generator won't necessarily be running the same operating system as the bitcode producer was, and the consumer of the object file will be expecting a path style consistent with the bitcode producer.

rnk added a comment.Apr 10 2018, 10:46 AM

Can you elaborate on the use case? What tool do you expect will consume these Unix style paths? What aspect of the path is invalidated by changing the slash style?

I think the right way to address these distributed cross-compilation use cases is probably going to be to have a flag like "-fdebug-compilation-dir" that the build system supplies with a Windows-style drive letter path.

pcc added a comment.Apr 10 2018, 1:23 PM
In D45473#1063321, @rnk wrote:

Can you elaborate on the use case? What tool do you expect will consume these Unix style paths?

We can use the paths to provide better linker error messages. See D45467.

What aspect of the path is invalidated by changing the slash style?

It means for example that users can't copy the path out of the linker error message and paste it to their editor.

I think the right way to address these distributed cross-compilation use cases is probably going to be to have a flag like "-fdebug-compilation-dir" that the build system supplies with a Windows-style drive letter path.

Yes this is one example of a case where a "foreign" path ends up being embedded in the directory component of a debug info path and the code generator will need to respect that and not the path convention of the host machine. A similar situation is where the frontend is run on the local machine and the backend runs on the remote machine -- this is the distributed thinlto situation. In that case the frontend will "naturally" embed host-style paths in the directory component of debug info paths and the backend will again need to respect them.

rnk added a comment.Apr 10 2018, 4:11 PM
In D45473#1063531, @pcc wrote:
In D45473#1063321, @rnk wrote:

Can you elaborate on the use case? What tool do you expect will consume these Unix style paths?

We can use the paths to provide better linker error messages. See D45467.

For this use case, is it actually important to preserve symlinks? Would it be enough to make the linker canonicalize the slashes to native format when printing the diagnostic?

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
117–125

Yes, this definitely shouldn't be conditional on macros, it's deep in LLVM codegen. llc should do the same thing with IR everywhere.

pcc added a comment.Apr 10 2018, 4:42 PM
In D45473#1063737, @rnk wrote:
In D45473#1063531, @pcc wrote:
In D45473#1063321, @rnk wrote:

Can you elaborate on the use case? What tool do you expect will consume these Unix style paths?

We can use the paths to provide better linker error messages. See D45467.

For this use case, is it actually important to preserve symlinks? Would it be enough to make the linker canonicalize the slashes to native format when printing the diagnostic?

Assuming that we don't care about symlinks, that would work I suppose, but it wouldn't necessarily do the right thing in distributed links, and it would mean that any other tools that consume CodeView on non-Windows (e.g. LLDB remote debugging a Windows binary) would need to do the same thing.

rnk accepted this revision.Apr 11 2018, 9:52 AM

OK, here are the two competing use cases that I am imagining:

  1. (yours) Full cross-compilation from Unix, with linker diagnostics referring to files that exist locally on Unix.
  2. Distributed cross-compilation on Unix, local linking on Windows with the VC linker.

I think the correct solution to 2 is going to be to use -fdebug-compilation-dir and -fdebug-prefix-map=/unix/path=C:/windows/path so that these DIFile directory components start with a drive letter that makes sense on the system doing the link.

Basically, if we have a DIFile directory that starts with /, let's trust that the producer wants Unix paths in their CodeView.

This revision is now accepted and ready to land.Apr 11 2018, 9:52 AM
pcc added a comment.Apr 11 2018, 11:12 AM
In D45473#1064430, @rnk wrote:

OK, here are the two competing use cases that I am imagining:

  1. (yours) Full cross-compilation from Unix, with linker diagnostics referring to files that exist locally on Unix.
  2. Distributed cross-compilation on Unix, local linking on Windows with the VC linker.

I think the correct solution to 2 is going to be to use -fdebug-compilation-dir and -fdebug-prefix-map=/unix/path=C:/windows/path so that these DIFile directory components start with a drive letter that makes sense on the system doing the link.

Basically, if we have a DIFile directory that starts with /, let's trust that the producer wants Unix paths in their CodeView.

My thoughts exactly.

This revision was automatically updated to reflect the committed changes.