This is an archive of the discontinued LLVM Phabricator instance.

[CodeView] Add full repro to LF_BUILDINFO record
ClosedPublic

Authored by aganea on May 29 2020, 12:42 PM.

Details

Summary

This patch adds some missing information to the LF_BUILDINFO which allows for rebuilding a .CPP without any external dependency but the .OBJ itself (other than the compiler).

Some external tools that we are using (Recode, Live++) are extracting the information to reproduce a build without any knowledge of the build system. The LF_BUILDINFO therefore stores a full path to the compiler, the PWD, a relative or absolute path to the TU, and the full CC1 command line. The command line needs to be freestanding (not depend on any environment variables). In the same way, MSVC doesn't store the provided command-line, but an expanded version (somehow their equivalent of CC1) which is also freestanding.

For more information see PR36198 and D43002.

Diff Detail

Unit TestsFailed

Event Timeline

aganea created this revision.May 29 2020, 12:42 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptMay 29 2020, 12:42 PM
akhuang added inline comments.Jun 1 2020, 2:24 PM
llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
938–939

Can the 'path to compiler and command line' part of the FIXME be removed?

aganea updated this revision to Diff 267880.Jun 2 2020, 7:24 AM
aganea marked an inline comment as done.

As requested.

hans added inline comments.Jun 2 2020, 7:30 AM
clang/include/clang/Basic/CodeGenOptions.h
320 ↗(On Diff #267331)

The name BuildTool makes me think of things like Make or MSBuild rather than the compiler. And in this case we know the compiler is Clang :-) Also since this is for capturing the cc1 arguments, it shouldn't really matter if it's clang-cl, clang++ or just clang. So it seems unfortunate that it has to be piped through all the way like this..

Is there some way we could just grab the path to the current clang binary during the pdb writing?

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
892

Don't we already have code somewhere that can do this quoting? E.g. the code that prints the cc1 args for "clang -v"?

aganea marked an inline comment as done.Jun 2 2020, 7:35 AM
aganea added inline comments.
clang/test/CodeGen/debug-info-codeview-buildinfo.c
4

Is there a way to launch an arbitrary program from lit? I started extracting the tool and the cmd-line from the .OBJ, to run it again. But I couldn't figure out how to start the extracted tool pathname? Something along the lines of

// RUN: eval $CMD
aganea marked 2 inline comments as done.Jun 2 2020, 8:01 AM
aganea added inline comments.
clang/include/clang/Basic/CodeGenOptions.h
320 ↗(On Diff #267331)

There's sys::fs::getMainExecutable(const char *argv0, void *MainExecAddr) but that's not guaranteed to work on all platforms, you still need to provide argv[0] to fallback, see https://github.com/llvm/llvm-project/blob/master/llvm/lib/Support/Unix/Path.inc#L232
I'm not well versed in Unix distros, can this only rarely occur?
Would you prefer that I used that function instead? (it is ok on Windows though)

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
892

Yes, the code below was copy-pasted from Command::printArg. But that's in Clang. Should I make that code common somewhere in llvm/lib/Support?

hans added inline comments.Jun 2 2020, 10:24 AM
clang/include/clang/Basic/CodeGenOptions.h
320 ↗(On Diff #267331)

I think it can occur for example if the compiler is running in a chroot. And probably other cases as well.
So maybe piping the path through is better. But perhaps we should just call it argv0?
Or could we get away with putting just "clang" there? Does the tool consuming this info actually need the real path to the compiler or could we rely on it finding it on PATH?

clang/test/CodeGen/debug-info-codeview-buildinfo.c
4

Hmm, I can't think of any way to do that with lit. But it's probably enough to check that the the data looks reasonable without actually invoking the tool again?

Does the full path to the tool end up only in the object file, or does this make it all the way to the final executable or library? Does embedding full paths affect distributed builds or build reproducibility?

aganea marked an inline comment as done.EditedJun 2 2020, 12:46 PM

Does the full path to the tool end up only in the object file, or does this make it all the way to the final executable or library?

The LF_BUILDINFO records for each .OBJ are emitted as they are in the final .PDB.

Does embedding full paths affect distributed builds or build reproducibility?

It does affect reproducibility for distributed builds, the contents of LF_BUILDINFO for a remotely-compiled .OBJ will be different in the final .PDB.
In our case, compilation jobs can be executed locally or remotely, depending on how they are queued.
For example,

  • If a local compilation, C:\.nuget\llvm.10.0.0.2\bin\clang-cl.exe would be stamped in the LF_BUILDINFO record.
  • If a remote compilation, C:\Users\SomeUser\AppData\Local\Temp\.fbuild.tmp\worker\toolchain.130589cdf35aed3b\clang-cl.exe would be stamped (the compiler executable is sent to the remote worker).

But you've got a good point. We would need an way to force the remote compiler to stamp the local path (which is unique for all users). We've got a manual override if the compiler path referred by LF_BUILDINFO can't be found locally, but this is a bit annoying, see: https://liveplusplus.tech/docs/documentation.html#FASTBuild.
It is better if the right information was there in the first place in the .OBJ. Could we do a remapping instead through VFS maybe? I'd like to avoid adding yet-a-new-flag to handle that.

clang/include/clang/Basic/CodeGenOptions.h
320 ↗(On Diff #267331)

Ok for argv0.
Users can have several versions of a toolchain installed locally, if they switch between branches or if working on different games at the same time.
We need to identify the right compiler through its full path, we can't rely on PATH.

thakis added a comment.Jun 2 2020, 3:50 PM

The change description says something about PWD (which makes me worried about build determinism, like amccarth), but I don't see anything about that in the diff.

The diff as-is looks fine to me with hans's comment about duplication resolved in some form. Please don't add code that stores absolute paths, but at the moment this doesn't do that as far as I can tell :)

The change description says something about PWD

I believe that was a typo for CWD.

Please don't add code that stores absolute paths, but at the moment this doesn't do that as far as I can tell :)

Ah, but the summary suggests that it is:

The LF_BUILDINFO therefore stores a full path to the compiler....

From what I see, it's storing Args[0], which is usually the program name as specified on the command line. If a build system is using full paths in the commands, I think they will be stored.

clang/test/CodeGen/debug-info-codeview-buildinfo.c
9

PWD? Did you mean CWD (current working directory)?

aganea marked an inline comment as done.EditedJun 3 2020, 10:44 AM

I didn't change the PWD/CWD stored in LF_BUILDINFO, it has already been done by Reid a while ago: D53179 (it already stores absolute paths)

However absolute paths are stored by design, if we want to reproduce the build locally. I can't see how this feature could be useful if we didn't store absolute paths. The user could have many different branches of the same game synced locally (we're using perforce).

We could somehow defer the full path calculation to link-time if we didn't want full paths in the .OBJs. But each LF_BUILDINFO in the final .PDB needs to store full local paths.
Please let me know what direction should be taken.

This is what I have now:

> cat b.cpp
int f() { return 42; }

# With Microsoft cl.exe (VS2019):

> cl b.cpp /c /Z7
> llvm-pdbutil dump -all b.obj | grep LF_BUILDINFO -A 5 -B 6
0x1007 | LF_STRING_ID [size = 248] ID: <no type>, String: -c -Z7 -MT -I"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.26.28801\ATLMFC\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.26.28801\include" -I"C:\Program
0x1008 | LF_STRING_ID [size = 268] ID: <no type>, String:  Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um"
0x1009 | LF_SUBSTR_LIST [size = 16]
         0x1007: `-c -Z7 -MT -I"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.26.28801\ATLMFC\include" -I"C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.26.28801\include" -I"C:\Program`
         0x1008: ` Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um"`
0x100A | LF_STRING_ID [size = 160] ID: 0x1009, String:  -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -TP -X
0x100B | LF_BUILDINFO [size = 28]
         0x1003: `D:\llvm-project`
         0x1004: `C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.26.28801\bin\HostX64\x64\cl.exe`
         0x1005: `b.cpp`
         0x1006: `D:\llvm-project\vc140.pdb`
         0x100A: ` -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" -I"C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -TP -X`

# With Clang:

> clang-cl b.cpp /c /Z7
> llvm-pdbutil dump -all b.obj | grep LF_BUILDINFO -A 5
0x1007 | LF_BUILDINFO [size = 28]
         0x1003: `D:\llvm-project`
         0x1005: `D:\llvm-project\buildninjaDebMSVC\bin\clang-cl.exe`
         0x1004: `b.cpp`
         <no type>: ``
         0x1006: `-cc1 -triple x86_64-pc-windows-msvc19.26.28806 -emit-obj -mrelax-all -mincremental-linker-compatible -disable-free -main-file-name -mrelocation-model pic -pic-level 2 -mthread-model posix -mframe-pointer=none -relaxed-aliasing -fmath-errno -fno-rounding-math -mconstructor-aliases -munwind-tables -target-cpu x86-64 -mllvm -x86-asm-syntax=intel -D_MT -flto-visibility-public-std --dependent-lib=libcmt --dependent-lib=oldnames -stack-protector 2 -fms-volatile -fdiagnostics-format msvc -gcodeview -debug-info-kind=limited -resource-dir "D:\llvm-project\buildninjaDebMSVC\lib\clang\11.0.0" -internal-isystem "D:\llvm-project\buildninjaDebMSVC\lib\clang\11.0.0\include" -internal-isystem "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.26.28801\ATLMFC\include" -internal-isystem "C:\Program Files (x86)\Microsoft Visual Studio\2019\Professional\VC\Tools\MSVC\14.26.28801\include" -internal-isystem "C:\Program Files (x86)\Windows Kits\NETFXSDK\4.8\include\um" -internal-isystem "C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\ucrt" -internal-isystem "C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\shared" -internal-isystem "C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\um" -internal-isystem "C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\winrt" -internal-isystem "C:\Program Files (x86)\Windows Kits\10\include\10.0.18362.0\cppwinrt" -fdeprecated-macro -fdebug-compilation-dir "D:\llvm-project" -ferror-limit 19 -fmessage-length=120 -fno-use-cxa-atexit -fms-extensions -fms-compatibility -fms-compatibility-version=19.26.28806 -std=c++14 -fdelayed-template-parsing -fcolor-diagnostics -faddrsig -o b.obj -x c++ `
clang/test/CodeGen/debug-info-codeview-buildinfo.c
9

I meant the current working directory when the program starts. I was under the impression that the nomenclature for that is PWD = CWD at startup. While CWD is the current directory at any point in time during the execution, and can be different of PWD.
Would you prefer if I changed the regex capture name to CWD?

aganea added a subscriber: rnk.Jun 3 2020, 10:45 AM
hans added a comment.Jun 3 2020, 11:12 AM

I didn't change the PWD/CWD stored in LF_BUILDINFO, it has already been done by Reid a while ago: D53179 (it already stores absolute paths)

That's surprising to me. Nico is the real export on reproducible builds, but we generally try very hard to make builds reproducible independent of source and build dir. Maybe there should be a flag controlling this? /Brepro?

We don't store physical absolute paths in pdbs if you hold things right. Look for /pdbsourcepath: in http://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html . rnk's change seems to store the main TU's dir as cwd, which likely either takes the dir from the main TU as passed (ie if you pass an erlative path, it stores that), or it honors /pdbsourcepath:. If that's fine as-is, that's ok. It's different from what pwd returns though I think. (imagine you're in c:\foo and you build bar\baz.cc. the cwd in the pdb will be c:\foo\bar but the command will be relative to c:\foo if I understand things right? I might not though.)

aganea updated this revision to Diff 268359.Jun 3 2020, 8:56 PM
  • Using sys::flattenWindowsCommandLine instead of previous quoting function, as suggested by @hans
  • Added Clang tests for -fdebug-compilation-dir .
  • Added LLD support for making an absolute path out of LF_BUILDINFO's CurrentDirectory.
aganea marked 5 inline comments as done.Jun 3 2020, 9:16 PM

I didn't change the PWD/CWD stored in LF_BUILDINFO, it has already been done by Reid a while ago: D53179 (it already stores absolute paths)

That's surprising to me. Nico is the real export on reproducible builds, but we generally try very hard to make builds reproducible independent of source and build dir. Maybe there should be a flag controlling this? /Brepro?

Ah, you must be using -fdebug-compilation-dir on Chromium I suppose, to make paths relative in .OBJs. Otherwise the default is the full path to PWD/CWD. Added a test for that.

We don't store physical absolute paths in pdbs if you hold things right. Look for /pdbsourcepath: in http://blog.llvm.org/2019/11/deterministic-builds-with-clang-and-lld.html . rnk's change seems to store the main TU's dir as cwd, which likely either takes the dir from the main TU as passed (ie if you pass an erlative path, it stores that), or it honors /pdbsourcepath:. If that's fine as-is, that's ok. It's different from what pwd returns though I think. (imagine you're in c:\foo and you build bar\baz.cc. the cwd in the pdb will be c:\foo\bar but the command will be relative to c:\foo if I understand things right? I might not though.)

Without -fdebug-compilation-dir, the full path is stored in LF_BUILDINFO (see CodeGenOpts::DebugCompilationDir and CGDebugInfo::getCurrentDirname())
With -fdebug-compilation-dir ., the PWD/CWD becomes relative ('.') and I've added code in LLD to call pdbMakeAbsolute on it, like for other PDB strings.
Now /pdbsourcepath works as well (all this is covered in lld/test/COFF/pdb-relative-source-lines.test).
However if you're in c:\foo and you do clang-cl /c bar\baz.cc, then bar\baz.cc will be stored as-it-is in LF_BUILDINFO (the PWD/CWD and the main filename are stored separately)

The only absolute paths that remain are: a. the compiler path (D:\llvm-project\buildninjaDebMSVC\bin\clang-cl.exe in the yaml below) and b. the -internal-isystem paths. However that is not an issue on our end, as we're building with -nostdinc + nuget packages installed in a fixed dir on all users' machines. Not sure how that works for you?

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
892

Replaced by llvm::sys::flattenWindowsCommandLine. I can't find any equivalent for Linux, I hope just aggregating the arguments with spaces in between is fine?

hans added a comment.Jun 4 2020, 6:14 AM

The only absolute paths that remain are: a. the compiler path (D:\llvm-project\buildninjaDebMSVC\bin\clang-cl.exe in the yaml below) and b. the -internal-isystem paths. However that is not an issue on our end, as we're building with -nostdinc + nuget packages installed in a fixed dir on all users' machines. Not sure how that works for you?

a. The compiler path is only absolute because it was absolute when put into argv[0] right? I don't see anything in the code that makes it absolute? I imagine most build systems will invoke the compiler using an absolute path so you'll get the desired result.

b. We pass -imsvc flags to specify relative paths to the msvc headers, so the -internal-isystem paths passed to cc1 are relative too.

clang/test/CodeGen/debug-info-codeview-buildinfo.c
2

The %s arg needs to come after "--", otherwise it can be interpreted as a command line flag, e.g. on Mac files are often under /Users which will be interpreted as a /U flag (see other tests using %clang_cl for examples).

clang/tools/driver/cc1_main.cpp
184 ↗(On Diff #268359)

Maybe I'm missing something, but why is this changing? The current code with Argv0 and the rest of the args as separate parameters seems like it would fit in will with the rest of the patch?

lld/COFF/PDB.cpp
1041 ↗(On Diff #268359)

Naive question because I'm not familiar with the PDB writing, but would it be possible to remap the LF_BUILDINFO entries earlier, e.g. when they're read in from the object files? It seems like a lot of code is now added to do the remapping "after the fact".

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
896

Does Arg.data() == nullptr really happen here?

902

I suppose this won't work with filenames that contain spaces.

I was hoping we could do whatever "clang -v" does when it prints arguments. It seems to do some basic quoting and escaping. Does it have some function we could call from here?

939–948

Wouldn't it be simpler to just check Argv0 != nullptr here?

aganea updated this revision to Diff 268881.EditedJun 5 2020, 11:01 AM
aganea marked 11 inline comments as done.

Addressed comments (see inline). Ensured the generated .OBJ contains relative paths when passing -fdebug-compilation-dir . -no-canonical-prefixes.

a. The compiler path is only absolute because it was absolute when put into argv[0] right? I don't see anything in the code that makes it absolute? I imagine most build systems will invoke the compiler using an absolute path so you'll get the desired result.

Understood, I wasn't using -no-canonical-prefixes that's why I was seeing full paths for argv[0].

clang/test/CodeGen/debug-info-codeview-buildinfo.c
2

Fixed, Reid already mentionned that a while ago, I'll remember next time!

clang/tools/driver/cc1_main.cpp
184 ↗(On Diff #268359)

Reverted.

lld/COFF/PDB.cpp
1041 ↗(On Diff #268359)

The whole type merging machinery isn't designed for changing records on-the-fly. We only modify TypeIndex values inside the record, but for that we have hardcoded offsets (all the discoverTypeIndices functions in https://github.com/llvm/llvm-project/blob/cda7ff9ddcefe0051d173e7c126c679063d29fbb/llvm/lib/DebugInfo/CodeView/TypeIndexDiscovery.cpp). The rest of the record is emitted as-it-is in the output TPI or IPI streams.

It's certainly feaseable to modify records in TypeStreamMerger in the same way we examine the LF_ENDPRECOMP record. However that requires a lot more piping, and I figured it wasn't worth it (unless we have other record types to modify in the future).

llvm/lib/CodeGen/AsmPrinter/CodeViewDebug.cpp
896

Yes if passing clang -cc1 directly. Both the CC1Command and InitLLVM are pushing a nullptr terminator as the last arg :-( This seems quite widespead behavior across the codebase.

902

The code for printing "clang -v" is/was entirely in Clang: https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/Job.cpp#L103

I've moved Command::printArg to llvm::sys::printArg and using that now.
This creates a slight difference from MSVC, in the sense that with Clang all arguments are quoted, no matter what. Whereas MSVC adds quotes when it's necessary (that's LLVM's sys::flattenWindowsCommandLine behavior). But I think that shouldn't matter much.

Please @stefan_reinalter @lantictac let me know if you think otherwise.

D:\llvm-project> buildninjaRel\bin\clang-cl /c b.cpp /Z7 -fdebug-compilation-dir . -no-canonical-prefixes
D:\llvm-project> buildninjaRel\bin\llvm-pdbutil.exe dump -all b.obj | grep LF_BUILDINFO -A 5
0x1007 | LF_BUILDINFO [size = 28]
         0x1003: `.`
         0x1005: `buildninjaRel\bin\clang-cl.exe`
         0x1004: `b.cpp`
         <no type>: ``
         0x1006: `"-cc1" "-triple" "x86_64-pc-windows-msvc19.26.28806" "-emit-obj" "-mrelax-all" "-mincremental-linker-compatible" "-disable-free" "-main-file-name" "b.cpp" "-mrelocation-model" "pic" "-pic-level" "2" "-mthread-model" "posix" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-fno-rounding-math" "-mconstructor-aliases" "-munwind-tables" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-gcodeview" "-debug-info-kind=limited" "-resource-dir" "buildninjaRel\\lib\\clang\\11.0.0" "-internal-isystem" "buildninjaRel\\lib\\clang\\11.0.0\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\ATLMFC\\include" "-internal-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Professional\\VC\\Tools\\MSVC\\14.26.28801\\include" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\NETFXSDK\\4.8\\include\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\ucrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\shared" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\um" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\winrt" "-internal-isystem" "C:\\Program Files (x86)\\Windows Kits\\10\\include\\10.0.18362.0\\cppwinrt" "-fdeprecated-macro" "-fdebug-compilation-dir" "." "-ferror-limit" "19" "-fmessage-length=146" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.26.28806" "-std=c++14" "-fdelayed-template-parsing" "-fcolor-diagnostics" "-faddrsig" "-x" "c++"`
aganea updated this revision to Diff 268906.Jun 5 2020, 12:08 PM

Fix tests.

hans accepted this revision.Jun 8 2020, 2:12 AM

lgtm

clang/test/CodeGen/debug-info-codeview-buildinfo.c
4

This one also needs "-- %s" at the end.

This revision is now accepted and ready to land.Jun 8 2020, 2:12 AM
amccarth accepted this revision.Jun 12 2020, 3:45 PM

My only real concern was about the possible reproducibility impact of introducing absolute paths into the binaries. If @thakis and @hans are satisfied this is OK, then LGTM.

clang/test/CodeGen/debug-info-codeview-buildinfo.c
9

Sorry, I always misread "pwd" as "password."

This revision was automatically updated to reflect the committed changes.
aganea added a comment.EditedJun 18 2020, 7:47 AM

Thanks for taking the time @hans, @amccarth!

I've split this into several smaller patches, to ease bisect if needs be:
rGa45409d8855a1e4538990507ef25e9b51c090193 - [Clang] Move clang::Job::printArg to llvm::sys::printArg. NFCI.
rG24eff42ba4b85eaa5429e8efe9bd2070d34ba1f7 - [CodeView] Add TypeCollection::replaceType to replace type records post-merging
rG89ea0b05207d45c145fb525df554b3b986ae379b - [MC] Pass down argv0 & cc1 cmd-line to the back-end and store in MCTargetOptions
rG403f9537924b8910ed4f741ed96c61f5e657915b - [CodeView] Add full repro to LF_BUILDINFO record <- this enables the feature

I've supplemented this current patch with two fixes:
rG8374bf43634725dc02a262a77b5f940fca25938c - [CodeView] Fix generated command-line expansion in LF_BUILDINFO. Fix the 'pdb' entry which was previously a null reference, now an empty string.
rGcab3fc53d2e173243a462e9c8e914af58ddbeaba - Fix linker error in clang-fuzzer following 89ea0b05207d45c145fb525df554b3b986ae379b

Looks like this breaks tests on mac: http://45.33.8.238/mac/15751/step_10.txt

Please take a look.

Looks like this breaks tests on mac: http://45.33.8.238/mac/15751/step_10.txt

Please take a look.

Checking now.

This is still broken; bots have been red for a few hours now. Can we revert and analyze async, to keep the bots green please?

This is still broken; bots have been red for a few hours now. Can we revert and analyze async, to keep the bots green please?

Yes I will revert, but I don't understand. All the tests pass here on four different machines, Windows and Linux, when compiling with different compilers and different configurations. I will investigate offline.

Hm http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/16533 is happy so it's not broken on all Windows machines. Guess I'll take a look on what's going on. You can hold off on the revert for now.

Hm http://lab.llvm.org:8011/builders/clang-x64-windows-msvc/builds/16533 is happy so it's not broken on all Windows machines. Guess I'll take a look on what's going on. You can hold off on the revert for now.

I wonder if the "sed" substitution below could fail on some machines? (see lld/test/COFF/pdb-relative-source-lines.test)

Reverted in rG2ae0df5be7408a79524743762b6c74953f31b805. I'll investigate the issue on my end.

I took a look. It consistently fails for me on mac and on Windows in a git bash shell, but it passes in cmd with unxutils. Maybe this needs something like eac56724fd955af0f8521557cacc57a83f371649 ?

dmajor added a subscriber: dmajor.Jun 22 2020, 9:15 AM
aganea updated this revision to Diff 272556.Jun 22 2020, 3:23 PM
aganea marked 2 inline comments as done.Jun 22 2020, 3:29 PM
aganea added a subscriber: uweigand.
aganea added inline comments.
clang/test/CodeGen/debug-info-codeview-buildinfo.c
5
lld/test/COFF/pdb-relative-source-lines2.test
44 ↗(On Diff #272556)

@thakis : I was unable to make this work otherwise than by restricting this test to windows (REQUIRES: system-windows at the top). This works now on Cygwin (git bash) and GnuWin32, but not on Linux. Would you mind if this test was Windows-specific?

Line 4 here fails on s390x but not on other Unix flavors, see: http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/33346/steps/ninja%20check%201/logs/FAIL%3A%20Clang%3A%3Adebug-info-codeview-buildinfo.c

@thakis @uweigand Any ideas would could go wrong here?

Well, when I try it, the compile command is creating a native format object file, in this case a 64-bit big-endian s390x ELF file, and that format is not recognized by llvm-pdbutil. Not sure why this wouldn't fail the same way on any other non-Windows target, though.

I guess it should work if you pass the appropriate --target option to the compiler to force creation of a Windows object file.

Line 4 here fails on s390x but not on other Unix flavors, see: http://lab.llvm.org:8011/builders/clang-s390x-linux/builds/33346/steps/ninja%20check%201/logs/FAIL%3A%20Clang%3A%3Adebug-info-codeview-buildinfo.c

@thakis @uweigand Any ideas would could go wrong here?

Well, when I try it, the compile command is creating a native format object file, in this case a 64-bit big-endian s390x ELF file, and that format is not recognized by llvm-pdbutil. Not sure why this wouldn't fail the same way on any other non-Windows target, though.

I guess it should work if you pass the appropriate --target option to the compiler to force creation of a Windows object file.

Thanks for getting back! That was my suspicion. I initially thought that using clang-cl would automatically target windows-msvc, but it seems that's not the case, I'll add it.

When using clang-cl, this code should set it to the right format, on any platform: https://github.com/llvm/llvm-project/blob/master/clang/lib/Driver/Driver.cpp#L1071
However maybe something else happens on big endian architectures?

@uweigand Would you mind running the first clang-cl command with -### see what target is being used? This is what I'm seeing on my Ubuntu machine. Note the x86_64-pc-windows-msvc19.11.0 triple. If using plain clang -c b.cpp then x86_64-unknown-linux-gnu is used instead.

aganea@PC:/mnt/f/llvm-project$ cat b.cpp
int A;
int foo() {
  return A * 20;
}
int main() {
  return foo();
}
aganea@PC:/mnt/f/llvm-project$ ./buildclang9/bin/clang-cl /c b.cpp -###
clang version 11.0.0 (https://github.com/llvm/llvm-project.git bfdb834bc3d5e948002e5061a4c50d5c9a97de59)
Target: x86_64-pc-windows-msvc
Thread model: posix
InstalledDir: /mnt/f/llvm-project/./buildclang9/bin
 (in-process)
 "/mnt/f/llvm-project/buildclang9/bin/clang-11" "-cc1" "-triple" "x86_64-pc-windows-msvc19.11.0" "-emit-obj" "-mrelax-all" "-mincremental-linker-compatible" "-disable-free" "-main-file-name" "b.cpp" "-mrelocation-model" "pic" "-pic-level" "2" "-mthread-model" "posix" "-mframe-pointer=none" "-relaxed-aliasing" "-fmath-errno" "-fno-rounding-math" "-masm-verbose" "-mconstructor-aliases" "-munwind-tables" "-target-cpu" "x86-64" "-mllvm" "-x86-asm-syntax=intel" "-D_MT" "-flto-visibility-public-std" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-stack-protector" "2" "-fms-volatile" "-fdiagnostics-format" "msvc" "-dwarf-column-info" "-resource-dir" "/mnt/f/llvm-project/buildclang9/lib/clang/11.0.0" "-internal-isystem" "/mnt/f/llvm-project/buildclang9/lib/clang/11.0.0/include" "-fdeprecated-macro" "-fdebug-compilation-dir" "/mnt/f/llvm-project" "-ferror-limit" "19" "-fmessage-length" "0" "-fno-use-cxa-atexit" "-fms-extensions" "-fms-compatibility" "-fms-compatibility-version=19.11" "-std=c++14" "-fdelayed-template-parsing" "-fobjc-runtime=gcc" "-fdiagnostics-show-option" "-fcolor-diagnostics" "-faddrsig" "-o" "b.obj" "-x" "c++" "b.cpp"

Hmm, with clang-cl it seems the driver is trying to use this:
Target: s390x-pc-windows-msvc
which of course doesn't exist. Not sure what is supposed to be happening here, but it seems that it's falling back on s390x-linux since on s390x, Linux is currently the only supported OS.

Hmm, with clang-cl it seems the driver is trying to use this:
Target: s390x-pc-windows-msvc
which of course doesn't exist. Not sure what is supposed to be happening here, but it seems that it's falling back on s390x-linux since on s390x, Linux is currently the only supported OS.

I'm seeing some of the tests are setting the target explicitly %clang_cl --target=x86_64-windows-msvc. Would that work on your machine? Or should I do UNSUPPORTED: s390x ?

Hmm, with clang-cl it seems the driver is trying to use this:
Target: s390x-pc-windows-msvc
which of course doesn't exist. Not sure what is supposed to be happening here, but it seems that it's falling back on s390x-linux since on s390x, Linux is currently the only supported OS.

I'm seeing some of the tests are setting the target explicitly %clang_cl --target=x86_64-windows-msvc. Would that work on your machine? Or should I do UNSUPPORTED: s390x ?

Sorry, looks like I missed this. I think using an explicit target should work.

This seems to be breaking determinism on Windows builds, see https://crbug.com/1117026. Can this be reverted and fixed?

This seems to be breaking determinism on Windows builds, see https://crbug.com/1117026. Can this be reverted and fixed?

Will do.

aganea reopened this revision.Oct 14 2020, 7:56 AM
This revision is now accepted and ready to land.Oct 14 2020, 7:56 AM
aganea planned changes to this revision.Oct 14 2020, 7:57 AM
aganea added a subscriber: saudi.

Does the full path to the tool end up only in the object file, or does this make it all the way to the final executable or library?

The LF_BUILDINFO records for each .OBJ are emitted as they are in the final .PDB.

Does embedding full paths affect distributed builds or build reproducibility?

It does affect reproducibility for distributed builds, the contents of LF_BUILDINFO for a remotely-compiled .OBJ will be different in the final .PDB.
In our case, compilation jobs can be executed locally or remotely, depending on how they are queued.
For example,

  • If a local compilation, C:\.nuget\llvm.10.0.0.2\bin\clang-cl.exe would be stamped in the LF_BUILDINFO record.
  • If a remote compilation, C:\Users\SomeUser\AppData\Local\Temp\.fbuild.tmp\worker\toolchain.130589cdf35aed3b\clang-cl.exe would be stamped (the compiler executable is sent to the remote worker).

But you've got a good point. We would need an way to force the remote compiler to stamp the local path (which is unique for all users). We've got a manual override if the compiler path referred by LF_BUILDINFO can't be found locally, but this is a bit annoying, see: https://liveplusplus.tech/docs/documentation.html#FASTBuild.
It is better if the right information was there in the first place in the .OBJ. Could we do a remapping instead through VFS maybe? I'd like to avoid adding yet-a-new-flag to handle that.

I'm quite late into this ticket, but just to tell our solution, our distributed compilation system maps its own work directory to an I:\ drive and all our compiler/dependencies live inside the same folder, so locally something is in D:\dev\game\toolset\compiler\clang-cl.exe becomes I:\dev\game\toolset\compiler\clang-cl.exe then when copying the objects back the distribution system maps I:\ to D:\, which doesn't break the obj sizes.

Does embedding full paths affect distributed builds or build reproducibility?

It does affect reproducibility for distributed builds, the contents of LF_BUILDINFO for a remotely-compiled .OBJ will be different in the final .PDB.

I don't think the path to the compiler is the only concern regarding reproducible builds. Including the full command-line means that artifacts change when options that have no (other) effect on the output are added. E.g., adding -Wall will create a different object file than -Wmost. I worry this is the wrong direction for default clang behaviour.

Should this new instability be restricted to when users explicitly request it?

If you do go with "off-by-default", the natural driver option to use would be the existing -grecord-command-line. You could use the same plumbing through -cc1 that it does.

aganea added a subscriber: dexonsmith.EditedOct 10 2021, 12:37 PM

Does embedding full paths affect distributed builds or build reproducibility?

It does affect reproducibility for distributed builds, the contents of LF_BUILDINFO for a remotely-compiled .OBJ will be different in the final .PDB.

I don't think the path to the compiler is the only concern regarding reproducible builds. Including the full command-line means that artifacts change when options that have no (other) effect on the output are added. E.g., adding -Wall will create a different object file than -Wmost. I worry this is the wrong direction for default clang behaviour.

In my sense, the fact that an option doesn't change the output is orthogonal to producing a deterministic output. I think most caching systems treat the compiler as a black box, by simply hashing the command-line and assuming the same .OBJ will be generated from that command-line. The build system could filter out options that are known to have no effect on the output, to address your concern. But I feel this is outside of the scope of this patch?

Should this new instability be restricted to when users explicitly request it?

The goal of this patch (and D43002) is simply to be on-par with what MSVC generates by default. It seems a bit counter-intuitive to provide -grecord-command-line to clang-cl only to match the MSVC behavior, but if that's the general consensus, I'm fine as long as LF_BUILDINFO is supported in LLVM.

If you do go with "off-by-default", the natural driver option to use would be the existing -grecord-command-line. You could use the same plumbing through -cc1 that it does.

You mean using the llvm.commandline metadata? That's an interesting idea which should be explored.

aganea updated this revision to Diff 396804.Dec 31 2021, 8:15 AM

Rebase & simplify.

This revision is now accepted and ready to land.Dec 31 2021, 8:15 AM

@dexonsmith The issue with -grecord-command-line/-frecord-command-line is that it isn't producing a self-standing command-line. For example:

> clang-cl /c main.cpp /clang:-grecord-command-line /Z7

would record: d:\\git\\llvm-project\\stage1\\bin\\clang-cl.exe --driver-mode=cl -c main.cpp /clang:-grecord-command-line /Z7 -grecord-command-line

Whereas our patch records the cc1 command, which is self-standing:

> stage1\bin\clang-cl /c main.cpp /Z7 -no-canonical-prefixes -fdebug-compilation-dir=.
> stage1\bin\llvm-pdbutil.exe dump -types .\main.obj | grep -A5 LF_BUILDINFO
0x1008 | LF_BUILDINFO [size = 28]
         0x1003: `.`
         0x1006: `stage1\bin\clang-cl.exe`
         0x1004: `main.cpp`
         0x1005: ``
         0x1007: `"-cc1" "-fmessage-length=120" "-fdiagnostics-format" "msvc" "-ferror-limit" "19" "-fcolor-diagnostics" "-mllvm" "-x86-asm-syntax=intel" "-disable-free" "-emit-obj" "-x" "c++" "-target-cpu" "x86-64" "-tune-cpu" "generic" "-triple" "x86_64-pc-windows-msvc19.29.30138" "-resource-dir" "stage1\\lib\\clang\\14.0.0" "-isystem" "stage1\\lib\\clang\\14.0.0\\include" "-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.29.30133\\include" "-isystem" "C:\\Program Files (x86)\\Microsoft Visual Studio\\2019\\Community\\VC\\Tools\\MSVC\\14.29.30133\\atlmfc\\include" "-isystem" "C:\\Scoop\\global\\apps\\winsdk\\10.0.22000.194\\sdk\\Include\\10.0.22000.0\\ucrt" "-isystem" "C:\\Scoop\\global\\apps\\winsdk\\10.0.22000.194\\sdk\\Include\\10.0.22000.0\\shared" "-isystem" "C:\\Scoop\\global\\apps\\winsdk\\10.0.22000.194\\sdk\\Include\\10.0.22000.0\\um" "-isystem" "C:\\Scoop\\global\\apps\\winsdk\\10.0.22000.194\\sdk\\Include\\10.0.22000.0\\winrt" "-std=c++14" "-fmath-errno" "-fms-compatibility" "-fdelayed-template-parsing" "-pic-level" "2" "-stack-protector" "2" "-fvisibility" "default" "-fdeprecated-macro" "-fms-compatibility-version=19.29.30138" "-ffp-contract=on" "-fno-experimental-relative-c++-abi-vtables" "-O0" "-fdebug-compilation-dir=." "-fcoverage-compilation-dir=D:\\git\\llvm-project" "-faddrsig" "-fms-volatile" "-fno-use-cxa-atexit" "-gcodeview" "-gno-column-info" "-mrelax-all" "-mincremental-linker-compatible" "--mrelax-relocations" "-relaxed-aliasing" "-funwind-tables=2" "-mconstructor-aliases" "--dependent-lib=libcmt" "--dependent-lib=oldnames" "-flto-visibility-public-std" "-clear-ast-before-backend" "-debug-info-kind=constructor" "-fdiagnostics-hotness-threshold=0" "-D" "_MT"`

(the LF_BUILDINFO record is replicated unchanged in the .PDB)

Microsoft does something similair:

> cl /c /Z7 .\main.cpp
Microsoft (R) C/C++ Optimizing Compiler Version 19.29.30136 for x64
Copyright (C) Microsoft Corporation.  All rights reserved.

main.cpp
> llvm-pdbutil.exe dump -types .\main.obj | grep -B4 -A5 LF_BUILDINFO
0x1009 | LF_SUBSTR_LIST [size = 16]
         0x1007: `-c -Z7 -MT -IC:\Scoop\global\apps\vs_buildtools\16.11.5\VS\VC\Tools\MSVC\14.29.30133\ATLMFC\include -IC:\Scoop\global\apps\vs_buildtools\16.11.5\VS\VC\Tools\MSVC\14.29.30133\include -IC:\Scoop\global\apps\winsdk\10.0.22000.194\sdk\include\10.0.22000.0\ucrt`
         0x1008: ` -IC:\Scoop\global\apps\winsdk\10.0.22000.194\sdk\include\10.0.22000.0\shared -IC:\Scoop\global\apps\winsdk\10.0.22000.194\sdk\include\10.0.22000.0\um -IC:\Scoop\global\apps\winsdk\10.0.22000.194\sdk\include\10.0.22000.0\winrt -IC:\Scoop\global\apps\winsdk\10`
0x100A | LF_STRING_ID [size = 64] ID: 0x1009, String: .0.22000.194\sdk\include\10.0.22000.0\cppwinrt -TP -X
0x100B | LF_BUILDINFO [size = 28]
         0x1003: `D:\git\llvm-project`
         0x1004: `C:\Scoop\global\apps\vs_buildtools\16.11.5\VS\VC\Tools\MSVC\14.29.30133\bin\HostX64\x64\cl.exe`
         0x1005: `.\main.cpp`
         0x1006: `D:\git\llvm-project\vc140.pdb`
         0x100A: `.0.22000.194\sdk\include\10.0.22000.0\cppwinrt -TP -X`

MSVC splits the command over multiple string records, I suppose to increase the chances of de-duplication in the .PDB, something that we could do later. But the essence is that the command is free-standing (note -X and the explicit includes) and can be run without any env.vars set, unlike -dwarf-debug-flags or -record-command-line.

This revision was landed with ongoing or failed builds.Jan 19 2022, 4:44 PM
This revision was automatically updated to reflect the committed changes.