This is an archive of the discontinued LLVM Phabricator instance.

[COFF][LLD] Add link support for precompiled headers .objs
ClosedPublic

Authored by aganea on Apr 3 2018, 9:42 AM.

Details

Summary

This change allows for link-time merging of debugging information from
Microsoft precompiled types .objs compiled with cl.exe /Z7 /Yc and /Yu.

Previously, LLD used to crash when encountering these .objs on the command line.

Precompiled types use two specific records:

  • LF_PRECOMP at the begining of a type stream, to signify a dependence on a

external precomp .obj

  • LF_ENDPRECOMP in a precomp .obj, which marks the end of the type stream which

should be used by the dependent .obj.

When a .obj with a LF_PRECOMP reference is found, we preempt the processing of
that .obj, and attempt to first load the precompiled .obj. Once the precompiled
.obj is loaded and remapped, we simply take its (remapped) type IndexMap and
paste it in the TypeStreamMerger, just before merging in the dependent .obj's
type indices. This allows for back-referencing precomp type records which have
already merged.

LF_PRECOMP and LF_ENDPRECOMP records are ultimately dropped from the target PDB.

This fixes https://bugs.llvm.org/show_bug.cgi?id=34278

Diff Detail

Event Timeline

aganea created this revision.Apr 3 2018, 9:42 AM
smeenai added a subscriber: smeenai.Apr 3 2018, 9:48 AM
rnk added a comment.Apr 3 2018, 10:58 AM

Can you split this into a patch that adds the basic YAML serialization and dumping functionality to LLVM, and a patch that adds the type merging capabilities across LLVM and LLD? The first one should take care of a lot of the boilerplate and we can stamp it quickly and then move on to review the complex type merging code.

aganea updated this revision to Diff 142078.Apr 11 2018, 2:51 PM
aganea edited the summary of this revision. (Show Details)

Following https://reviews.llvm.org/D45283, I've updated the review only with the type merging and LLD support.
Some extra support would be needed for llvm-readobj -codeview-merged-types, but I prefer to do that later.

lld/COFF/PDB.cpp
303 ↗(On Diff #142078)

I choose to preempt loading of the current .obj, and load the precomp .obj here. However if (future) multithreading is a concern, I could instead push the processing of the current .obj at the end of the stack. Let me know whichever you prefer.

312 ↗(On Diff #142078)

The magic happens here.

aganea added inline comments.Apr 11 2018, 3:19 PM
llvm/lib/DebugInfo/CodeView/TypeStreamMerger.cpp
488 ↗(On Diff #142078)

I'm not very happy about this - I think I'll move it to lld/COFF/PDB.cpp instead.

aganea marked an inline comment as done.Apr 12 2018, 6:04 AM
aganea added inline comments.
llvm/lib/DebugInfo/CodeView/TypeStreamMerger.cpp
488 ↗(On Diff #142078)

After thought, I'll leave it the way it is - I can't find a better way to prevent the LF_ENDPRECOMP from being added to the output TypeTable.

Sorry for the delay, been OOO for 3 days.

BTW, When you upload the next version, can you make sure it has full context?

lld/COFF/PDB.cpp
258 ↗(On Diff #142078)

Variable names should be CamelCase.

293 ↗(On Diff #142078)

Would it be possible to write this as something like:

if (isExternalTypeStreamRecord(Types))
  return handleExternalTypeStreamRecord(Types);

Then this new handle function could handle both LF_TYPESERVER2 and LF_PRECOMP, and we wouldn't need the weird function with the boolean anymore

500–501 ↗(On Diff #142078)

This doesn't look clang formatted.

506 ↗(On Diff #142078)

Upper case F

507–510 ↗(On Diff #142078)

Same here. Just fix all variable names actually.

542–544 ↗(On Diff #142078)

Eww. How about returning an Expected<const CVIndexMap&> instead?

llvm/lib/DebugInfo/CodeView/TypeStreamMerger.cpp
489–492 ↗(On Diff #142078)

I noticed that when precompiled headers are used, the S_OBJNAME symbol also has a matching signature. Can / should this be used somehow?

aganea removed rL LLVM as the repository for this revision.Jul 3 2018, 2:18 PM
ruiu added inline comments.Jul 5 2018, 11:41 AM
lld/COFF/PDB.cpp
258 ↗(On Diff #142078)

Generally we avoid optional parameters. Please always explicitly pass true or false.

464 ↗(On Diff #142078)

Please add a function comment explaining what this function is for.

This function seems a bit too long to me. Please consider splitting into smaller functions.

aganea updated this revision to Diff 154301.Jul 5 2018, 1:24 PM
aganea marked an inline comment as done.

Adressed @zturner's and @ruiu's comments.

aganea marked 9 inline comments as done.Jul 5 2018, 1:39 PM
aganea added inline comments.
lld/COFF/PDB.cpp
293 ↗(On Diff #142078)

Unfortunately we can't. In one case (PDB), we must return the full TPI/IPI map; in the other case (PCH) we must load the TPI map and fallthrough.

542–544 ↗(On Diff #142078)

Please unsee this.

llvm/lib/DebugInfo/CodeView/TypeStreamMerger.cpp
489–492 ↗(On Diff #142078)

Yes, and the new revision (https://reviews.llvm.org/D45213?id=154301) is using it. It was probably meant there at the begining of the stream to bail out quickly, if the associated signature doesn't match the PCH session.

aganea planned changes to this revision.Aug 3 2018, 12:51 PM
aganea marked 2 inline comments as done.

Currently splitting this in many smaller changes.

aganea updated this revision to Diff 167531.Sep 28 2018, 12:28 PM

I've corrected the issues raised. Anything that wasn't related to this change was already commited separately.

This patch is now ready for review again.

Ping! Could anybody please take a look at this? Thank you!

How big are the object files?

lld/trunk/COFF/PDB.cpp
326–358

FWIW, all object files always start with an S_OBJNAME, even if there is no PCH support. But if there is no PCH support, the signature field will just be 0.

355

I think we should just return 0; here. As you said, if it's there, it will be first, so if it's not first, there's no point to continue looking in subsequent .debug$S sections.

425–426

I almost wonder if we should add CVTypeArray::drop_front(). You don't have to do it in this patch, just something I thought of.

631–632

Can you add static to this function?

640

In another patch we were discussing whether we should stop using path::Style::windows. When cross-compiling you might be on non-Windows, and these path comparisons would be incorrect in that case. So I think we should probably just use the default native style here.

643–644

It's hard to know exactly what to do here. Is equals_lower correct or equals? It seems like we need a sys::path::equals_path() that uses case-insensitive comparison on Windows and case-sensitive on Linux. Can you add equals_path() as a global function in this file and use that here instead?

lld/trunk/test/COFF/precomp-link.test
11

The # signs are confusing. I don't think you need to put any marker in front of them. You can just write it as code. lit and FileCheck ignore comment markers at the beginning of lines. and only match or run lines that have a RUN or CHECK line after any comments. So just putting raw code directly in the file is fine.

BTW, sorry this took so long for anyone to review. I think partly it's because nobody here really taken the time to understand in detail how MS precompiled headers works, so we were all hoping someone else would review it. :-/

Anyway, at least from my point of view (unless someone else chimes in) this looks good provided it works (which you say it does) and it has tests (which it does) and the last remaining issues I outlined are fixed.

neerajksingh accepted this revision.Oct 19 2018, 5:14 PM

I just want to mention as a heads-up: there will later be some complexities in this PCH finding code when the /deterministic flags are added to public builds of cl.exe. PCH files paths might be replaced with environment variables that will need to be expanded when acquiring a PDB.

FYI, in the MS linker, we find out early on that a file contains precompiled type information from the presence of the debug$P section and mostly process those files before processing the non-PCT files. That avoids some need to search when acquiring the PCH data.

This revision is now accepted and ready to land.Oct 19 2018, 5:14 PM
rnk added a comment.Oct 22 2018, 2:16 PM

FYI, in the MS linker, we find out early on that a file contains precompiled type information from the presence of the debug$P section and mostly process those files before processing the non-PCT files. That avoids some need to search when acquiring the PCH data.

Thanks, that's interesting.

llvm/trunk/lib/DebugInfo/CodeView/TypeStreamMerger.cpp
100

I was hoping to come up with some way to avoid the extra out parameter, but that's my only real concern. Structurally, everything else looks like type servers, and it's pretty straightforward.

rnk added a comment.Oct 22 2018, 2:17 PM

I meant to add more before hitting submit...

I looked at the linker part, and things make sense to me. I trust that Zach looked more at the mechanics. I think we should fix the remaining issues he highlighted, and otherwise this is ready.

aganea updated this revision to Diff 170971.EditedOct 24 2018, 1:39 PM
aganea marked 7 inline comments as done.

Please see the updated diff which adresses comments.

How big are the object files?

65.6 KB. I want to ensure things work smoothly with cl-generated OBJs.

lld/trunk/COFF/PDB.cpp
425–426

I went ahead and added VarStreamArray::drop_front(), please let me know if that sounds good.

llvm/trunk/lib/DebugInfo/CodeView/TypeStreamMerger.cpp
100

I agree - in a subsequent chage we could fold the "merge" behavior into a single function, add a initializer struct, and call the function with a initializer list instead, ie. mergeRecords({Dest, Types, Hashes, EP}). That way we could accomodate different merging scenarios, and have only one codepath.

aganea updated this revision to Diff 171104.EditedOct 25 2018, 8:07 AM

Update diff again:

  • Added more coverage tests.
  • Added comment to clarify the sys::path::Style::windows and precomp OBJs situation when cross-compiling.

FWIW, all object files always start with an S_OBJNAME, even if there is no PCH support. But if there is no PCH support, the signature field will just be 0.

Just wanted to point out that it's only true for cl-generated OBJs. clang-cl does not generate the S_OBJNAME record.

lld/trunk/COFF/PDB.cpp
301

@zturner I've added this comment.

rnk accepted this revision.Oct 29 2018, 5:34 PM

FWIW, all object files always start with an S_OBJNAME, even if there is no PCH support. But if there is no PCH support, the signature field will just be 0.

Just wanted to point out that it's only true for cl-generated OBJs. clang-cl does not generate the S_OBJNAME record.

True. I added S_BUILDINFO the other week.

I think @zturner had a stalled patch to add S_OBJNAME. Yeah, this: https://reviews.llvm.org/D43002. I think the lesson is that it might be better not to stuff more things into DICompileUnit than absolutely necessary.

Is there anything else that needs to be done here? I'm happy with this.

This revision was automatically updated to reflect the committed changes.