This is an archive of the discontinued LLVM Phabricator instance.

[DebugInfo] Fix file path separator when targeting windows.
ClosedPublic

Authored by zequanwu on Mar 30 2023, 12:13 PM.

Details

Summary

This fixes two problems:

  1. When crossing compiling for windows on linux, source file path in debug info is concatenated with directory by host native separator ('/'). For windows local build, they are concatenated by '\'. This causes non-determinism bug.

    The solution here is to let LangOptions.UseTargetPathSeparator to control if we should use host native separator or not.
  1. Objectfile path in CodeView also uses host native separator when generated.

    It's fixed by changing the path separator in /Fo to '\' if the path is not an absolute path when adding the -object-file-name= flag.

Diff Detail

Event Timeline

zequanwu created this revision.Mar 30 2023, 12:13 PM
Herald added a project: Restricted Project. · View Herald TranscriptMar 30 2023, 12:13 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
zequanwu requested review of this revision.Mar 30 2023, 12:13 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMar 30 2023, 12:13 PM
zequanwu edited the summary of this revision. (Show Details)Mar 30 2023, 12:15 PM
hans added a comment.Mar 31 2023, 6:07 AM

Thanks for working on this!

The -ffile-reproducible flag name refers to making #file directives reproducible, but LangOptions.UseTargetPathSeparator sounds a lot broader :) I don't know what others think, but it would be nice to not have to introduce any more flags at least.

clang/lib/CodeGen/CGDebugInfo.cpp
544

Do we want to fix absolute filenames too?
I can see arguments for and against:

  • Using what the user provided makes sense
  • Some build systems might use absolute paths for everything. But on the other hand such builds have larger determinism problems (including the full paths).

So the current decision probably makes sense.

clang/test/CodeGen/debug-info-slash.c
6

Does the test runner write the 'clang/test/CodeGen' path with forward slashes also on Windows?

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
787

This handles codeview. Does anything need to be done for dwarf on windows? mstorsjo might have input on that.

llvm/test/DebugInfo/COFF/build-info.ll
13 ↗(On Diff #509771)

s/canalized/canonicalized/

18 ↗(On Diff #509771)

Does this write the .o file into the test directory? I don't think that's allowed (it may be read-only). But the test could create another subdirectory under %t-dir.

21 ↗(On Diff #509771)

But in the llc invocation, the user wrote a relative path with a forward slash. What behavior do we want? What should happen if there are more then one slash - I think remove_dots just works on the first one?

zequanwu updated this revision to Diff 510057.Mar 31 2023, 9:23 AM
zequanwu marked 2 inline comments as done.

Address comments.

Thanks for working on this!

The -ffile-reproducible flag name refers to making #file directives reproducible, but LangOptions.UseTargetPathSeparator sounds a lot broader :) I don't know what others think, but it would be nice to not have to introduce any more flags at least.

Oh, I was already using LangOptions.UseTargetPathSeparator. Updated the comment on UseTargetPathSeparator.

clang/lib/CodeGen/CGDebugInfo.cpp
544

Yeah. If it's already absolute filename, it will just use that one user provided.

clang/test/CodeGen/debug-info-slash.c
6

No, it uses clang\\test\\CodeGen. Removed this part.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
787

It looks like TM.Options.ObjectFilenameForDebug is only used for codeview. I guess dwarf doesn't store the object file path.

llvm/test/DebugInfo/COFF/build-info.ll
18 ↗(On Diff #509771)

It writes the .o file into the parent directory of the temporary dir %t-dir. It will just be the directory path of a normal %t, not the source test directory. The reason I'm not using %t-dir/build-info.o in the parent dir is because it will be translated into an absolute address. That will remain unchanged in ObjectName.

21 ↗(On Diff #509771)

Yeah, the input uses forward slash but we convert it into backslash. When there are multiple slashs, they will all be converted into backslashs, which is done by llvm::sys::path::remove_dots(PathStore, /*remove_dot_dot=*/true, llvm::sys::path::Style::windows_backslash);. remove_dots not only remove redundant dots but also canonicalize path separator based on the style.

I think ideally we want just take what user provided as the ObjectName with the redundant dots removed. But remove_dots always canonicalizes the path separator based on the style. I don't find any function that doesn't canonicalize the path.

zequanwu edited the summary of this revision. (Show Details)Mar 31 2023, 9:23 AM

I think we cannot be 100% sure about source paths in a cross-compile situation. Cross-compiling on platform A targeting platform B does not mean your sources and debugger UI are on platform B. My users keep source and debugger UI on platform A, debugging target B remotely. We need to preserve the host pathnames. It is not clear to me that this patch does so.

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
787

Right, DWARF only stores the source path.

hans added a comment.Mar 31 2023, 10:43 AM

I think we cannot be 100% sure about source paths in a cross-compile situation. Cross-compiling on platform A targeting platform B does not mean your sources and debugger UI are on platform B. My users keep source and debugger UI on platform A, debugging target B remotely. We need to preserve the host pathnames. It is not clear to me that this patch does so.

I believe the idea is to preserve host pathnames as long as LangOptions.UseTargetPathSeparator is unset. But I see that might not be the case for CodeViewDebug::emitObjName() in the current version of the patch. Could we make clang handle that path instead depending on the flag?

llvm/test/DebugInfo/COFF/build-info.ll
18 ↗(On Diff #509771)

Right, I'm just thinking that we can't be sure that we can (or should) write files into the parent of %t-dir.

Would it work if you instead create %t-dir/subdir, cd into that, and then do -o ../build-info.o?

zequanwu updated this revision to Diff 511118.EditedApr 5 2023, 8:42 AM
  • Pass LangOptions.UseTargetPathSeparator down to TargetOptions to use host path separator when it's set.
  • Add a test case. The problem is when if we are targeting to linux, clang won't generate codeview debuginfo with -gcodeview. So, I only add the test to target on windows.
zequanwu edited the summary of this revision. (Show Details)Apr 5 2023, 8:42 AM

An LLVM code change should be testable on its own; this has it tested by Clang.
I think you need a new command-line option to set TargetOptions::UseTargetPathSeparator e.g. via llvm-mc. Other TargetOptions are handled this way.

zequanwu updated this revision to Diff 511496.Apr 6 2023, 11:50 AM
  • Add a -use-target-path-separator flag for llc.
  • Add test for llc with that flag.
hans added a comment.Apr 11 2023, 5:21 AM
  • Add a -use-target-path-separator flag for llc.
  • Add test for llc with that flag.

But where does TM.Options.ObjectFilenameForDebug come from? Presumably it comes from Clang at some point, so is there any chance we can fix it "at the source" instead?

  • Add a -use-target-path-separator flag for llc.
  • Add test for llc with that flag.

But where does TM.Options.ObjectFilenameForDebug come from? Presumably it comes from Clang at some point, so is there any chance we can fix it "at the source" instead?

TM.Options.ObjectFilenameForDebug either comes from llc's -o or clang-cl's object-file-name= which is translated from /Fo[ObjectFileName].

For Chromium, the /Fo[ObjectFileName] we pass to clang-cl is the same when building on Windows and targeting Windows from Linux. The problem comes llvm::sys::path::remove_dots(PathStore, /*remove_dot_dot=*/true); in CodeViewDebug.cpp. That function always convert the path to use host's path separator. And I don't think we can write a remove_dots function that doesn't change path separator, because \ can only be used as path separator on Windows but is a valid path character on Linux.

Or we could just not use llvm::sys::path::remove_dots, use the user's input as it is for object file path.

  • Add a -use-target-path-separator flag for llc.
  • Add test for llc with that flag.

But where does TM.Options.ObjectFilenameForDebug come from? Presumably it comes from Clang at some point, so is there any chance we can fix it "at the source" instead?

TM.Options.ObjectFilenameForDebug either comes from llc's -o or clang-cl's object-file-name= which is translated from /Fo[ObjectFileName].

For Chromium, the /Fo[ObjectFileName] we pass to clang-cl is the same when building on Windows and targeting Windows from Linux. The problem comes llvm::sys::path::remove_dots(PathStore, /*remove_dot_dot=*/true); in CodeViewDebug.cpp. That function always convert the path to use host's path separator. And I don't think we can write a remove_dots function that doesn't change path separator, because \ can only be used as path separator on Windows but is a valid path character on Linux.

Or we could just not use llvm::sys::path::remove_dots, use the user's input as it is for object file path.

Well, MSVC cl removes redundant dots so we shouldn't remove llvm::sys::path::remove_dots.
Generally, I think it's okay to use target path separator for object file path in codeview when user input is not an absolute path, because we can only generate codeview when targeting on windows.

hans added a comment.Apr 12 2023, 9:07 AM

Well, MSVC cl removes redundant dots so we shouldn't remove llvm::sys::path::remove_dots.

Could we do the remove_dots on the Clang side, where we can decide based on the LangOpts?

Well, MSVC cl removes redundant dots so we shouldn't remove llvm::sys::path::remove_dots.

Could we do the remove_dots on the Clang side, where we can decide based on the LangOpts?

Yes, we can do that. Is there a reason to favor there over here? At here, the object path in llc output can also benefits from remove_dots, not just clang.

hans added a comment.Apr 12 2023, 11:45 AM

Well, MSVC cl removes redundant dots so we shouldn't remove llvm::sys::path::remove_dots.

Could we do the remove_dots on the Clang side, where we can decide based on the LangOpts?

Yes, we can do that. Is there a reason to favor there over here? At here, the object path in llc output can also benefits from remove_dots, not just clang.

The benefit is that if we do it in Clang, we can just look at the LangOpts and don't have to add a new target option, command line flag etc.

For llc, since it's an internal tool, I think it would be fine to just use the name as given.

zequanwu updated this revision to Diff 513252.Apr 13 2023, 8:36 AM
  • Move remove_dots to clang driver.
  • Revert changes to llc.
zequanwu edited the summary of this revision. (Show Details)Apr 13 2023, 8:37 AM
hans added inline comments.Apr 14 2023, 5:42 AM
clang/lib/Driver/ToolChains/Clang.cpp
583

Won't the code above (line 580) make many filenames absolute and cause us to use native slashes even when we want backslashes?

This would also need a test.

zequanwu updated this revision to Diff 513611.Apr 14 2023, 8:32 AM

Added clang-cl option test.

clang/lib/Driver/ToolChains/Clang.cpp
583

Won't the code above (line 580) make many filenames absolute and cause us to use native slashes even when we want backslashes?

Yes, I don't think we should change anything if it's converted to an absolute path.

Only if the -fdebug-compilation-dir is not given or is an absolute path, the path in /Fo will be converted to absolute path. Users can avoid the path being converted to absolute path by passing a relative path to -fdebug-compilation-dir, just like what we do in chrome build (-fdebug-compilation-dir=.).

Added a test.

hans added inline comments.Apr 17 2023, 4:28 AM
clang/lib/Driver/ToolChains/Clang.cpp
583

Apologies for all the questions, but I find the logic a bit complex still. The test cases are helpful though.

Could we simplify by just always using Windows slashes here, regardless of -fdebug-compilation-dir etc.?

zequanwu added inline comments.Apr 17 2023, 7:55 AM
clang/lib/Driver/ToolChains/Clang.cpp
583

The object file path will be converted to absolute path depending on whether -fdebug-compilation-dir is given or whether it's absolute path. I don't think we want to change a native absolute path of object file to use Windows slashes. Or this could happen on linux: /usr/local/src/a.obj -> \usr\local\src\a.obj. So, we only use windows slashes when it's not an absolute path here.

hans accepted this revision.Apr 17 2023, 9:52 AM

lgtm

clang/lib/Driver/ToolChains/Clang.cpp
583

Thanks, that makes sense. Can you add a comment with that explanation?

This revision is now accepted and ready to land.Apr 17 2023, 9:52 AM
zequanwu updated this revision to Diff 514287.Apr 17 2023, 10:07 AM
zequanwu marked an inline comment as done.

Add comment.

clang/lib/Driver/ToolChains/Clang.cpp
583

Done.

This revision was landed with ongoing or failed builds.Apr 17 2023, 10:07 AM
This revision was automatically updated to reflect the committed changes.