This is an archive of the discontinued LLVM Phabricator instance.

[LLD][COFF] Survive empty and invalid PCH signature
ClosedPublic

Authored by aganea on Oct 26 2022, 5:53 AM.

Details

Summary

While trying to use LLD with the Unreal Engine - which happens to heavily use PCH files - it was failing with "error: xxx.obj claims to be a PCH object, but does not have a valid signature". It seems recent versions of MSVC (or at least the one we use) do not generate the PCH signature in the S_OBJNAME record anymore. Instead we need to parse the entire .PCH.OBJ file until we reach the LF_ENDPRECOMP record, which contains a valid PCH signature. The code already was able to survive without that, by falling back to name lookup, but it's more reliable to use the precompSourceMappings map for matching the corresponding PCH file.

Diff Detail

Event Timeline

aganea created this revision.Oct 26 2022, 5:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 5:53 AM
aganea requested review of this revision.Oct 26 2022, 5:53 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 26 2022, 5:53 AM
rnk accepted this revision.Oct 26 2022, 9:15 AM

So, the signature moved from S_OBJNAME to LF_ENDPRECOMP, but everything else more or less stays the same.

Sounds good to me.

This revision is now accepted and ready to land.Oct 26 2022, 9:15 AM
thieta accepted this revision.Oct 26 2022, 9:21 AM

LGTM

Thanks for the review!

So, the signature moved from S_OBJNAME to LF_ENDPRECOMP, but everything else more or less stays the same.

Sounds good to me.

Actually the code supports both now, depending on the version of the MSVC compiler.

I'll hold on this for a bit, I'm still seeing weird warnings and the .PDB is only 450 MiB with LLD -- instead of 2 GiB with link.exe. I think the issues are unrelated, but i'd like to get to the bottom of it.

aganea updated this revision to Diff 471621.Oct 28 2022, 12:51 PM
aganea retitled this revision from [LLD][COFF] Survive empty PCH signature to [LLD][COFF] Survive empty and invalid PCH signature.
aganea edited the summary of this revision. (Show Details)

I finally found some time to look at this. The issue is more complicated than I thought, and the solution is not the same for GHASH than non-GHASH.

Another issue was that .PCH.OBJs are non-deterministic by default. Example:

>cat precomp.cpp
#include "precomp.h"

>cl /Yc /Z7 /c precomp.cpp /nologo
precomp.cpp

>llvm-pdbutil dump --all precomp.obj > dump.txt

>cl /Yc /Z7 /c precomp.cpp /nologo
precomp.cpp

>llvm-pdbutil dump --all precomp.obj > dump2.txt

>diff dump.txt dump2.txt
17991c17991
< 0x2382 | LF_ENDPRECOMP [size = 8] signature = 0x522700AA
---
> 0x2382 | LF_ENDPRECOMP [size = 8] signature = 0x2E1E4F09
18020c18020
<        0 | S_OBJNAME [size = 40] sig=1378287786, `C:\git\llvm-project\precomp.obj`
---
>        0 | S_OBJNAME [size = 40] sig=773738249, `C:\git\llvm-project\precomp.obj`

There was a situation where our .PCH.OBJs were always recompiled on a clean rebuild, but not the .OBJs (they were cached). MSVC link.exe was able to link without warnings or errors. When I first implemented the PCH support, I initially thought the PCH signature was a hard requirement, but it seems it's just a hint for the runtime. I relaxed that requirement in LLD, but now we'd still fail if the LF_ENDPRECOMP index doesn't line up with the data in LF_PRECOMP.

Also worth noting that using MSVC's cl /Yc /experimental:deterministic .. makes the PCH signature deterministic.

Would you mind taking a second look please?

aganea requested review of this revision.Oct 28 2022, 12:51 PM
rnk accepted this revision.Oct 31 2022, 9:59 AM

Thanks, looks good.

There was a situation where our .PCH.OBJs were always recompiled on a clean rebuild, but not the .OBJs (they were cached).

I'm imagining some system outside the build system forcing the .pch.obj to be recompiled, but the build system (fastbuild?) has correct and complete knowledge that none of the PCH inputs changed.

lld/COFF/DebugTypes.cpp
319

Maintaining two codepaths for type merging has significant costs, but I think the ghash code path is too optimized, and too difficult to understand. Perhaps with more work and cleanups, it could be made straightforward enough that we could afford to delete the regular type merging codepath, but for now I guess we should continue maintaining both.

lld/COFF/DebugTypes.h
116–120

Part of me wants to say, make it a TypeIndex to eliminate this subtlety, but the new usage uses it as a count, so I don't think it makes sense.

This revision is now accepted and ready to land.Oct 31 2022, 9:59 AM
thieta accepted this revision.Nov 2 2022, 5:06 AM

LGTM

This revision was landed with ongoing or failed builds.Nov 20 2022, 7:40 AM
This revision was automatically updated to reflect the committed changes.