[PDB] Fix linking of function symbols and local variables. The compiler outputs PROC32_ID symbols into the object files for functions, and these symbols have an embedded type index which, when copied to the PDB, refer to the IPI stream. However, the symbols themselves are also converted into regular symbols (e.g. S_GPROC32_ID -> S_GPROC32), and type indices in the regular symbol records refer to the TPI stream. So this patch applies two fixes to function records. 1. It converts ID symbols to the proper non-ID record type. 2. After remapping the type index from the object file's index space to the PDB file/IPI stream's index space, it then remaps that index to the TPI stream's index space by. Besides functions, during the remapping process we were also discarding symbol record types which we did not recognize. In particular, we were discarding S_BPREL32 records, which is what MSVC uses to describe local variables on the stack. So this patch fixes that as well by copying them to the PDB.
Details
Diff Detail
Event Timeline
lld/COFF/PDB.cpp | ||
---|---|---|
333 | This thing isn't really canonicalizing the symbol kind anymore, it's rewriting the symbol record. Maybe name it something like rewriteSymbolForPDB? | |
347 | We need an error check for OOB IPI index | |
348–351 | In theory, we could do the same discoverTypeIndices trick that we do above to find the type index reference. Alternatively, if we know it's an LF_FUNC_ID, we should load the appropriate offset. Are these ever LF_MFUNC_IDs, though? | |
486 | Let's keep this as part of the copySymbolForPdb. | |
llvm/lib/DebugInfo/CodeView/TypeIndexDiscovery.cpp | ||
399 | Maybe add S_BPREL32 before this? They're related, BPRELSYM32 vs REGREL32. | |
llvm/tools/llvm-pdbutil/MinimalSymbolDumper.cpp | ||
740–749 | Remove the FIXME, it's done now. :) |
lld/COFF/PDB.cpp | ||
---|---|---|
347 | That should have already happened when we merged the type streams. Wouldn't we have discarded any records that have invalid type index references, rather than emitting them with bad indices? | |
348–351 | We could just load the appropriate offset, but that would require hardcoding the offset, which is duplication of magic numbers that I'd like to avoid. I chose to do it this way because it makes the code more readable and is ultimately just copying some fields. |
lld/COFF/PDB.cpp | ||
---|---|---|
486 | I actually think it should be here. I tried moving it into copySymbolForPdb and the code becomes really ugly. You end up having to duplicate a lot of the logic from remapTypesInSymbolRecord but forcing it to remap into the TPI stream's index space. By doing it this way, each step is logically independent. copy symbol literally just copies some memory like the comment says. remap types does exactly what it says, and so does canonicalize. I can call it remapSymbolKind or something like that if you prefer, but I kind of like the idea of keeping it as a sequence of 3 independent steps like this. |
lld/COFF/PDB.cpp | ||
---|---|---|
333 | I still like rewriteSymbolForPDB | |
347 | Yeah, that makes sense. Any invalid item index would've become the simple "not translated" type index. | |
349 | These can be LF_MFUNC_ID records, we need to handle that. | |
486 | Right, I agree, it should happen after remapping. Let's call it something else, though. We're doing a lot more than canonicalizing the symbol kind. Ultimately, we're going to need logic here for filtering out S_UDT records. | |
487 | Rather than having this assert, let's remove the return type from canonicalizeSymbolKind and do SymbolKind NewKind = symbolKind(NewData);. It's one load. | |
488–490 | These != assertions are trivial, IMO. They also don't represent internal errors, we emitted those records for a while, and cvdump still accepted our PDBs. | |
llvm/lib/DebugInfo/CodeView/TypeIndexDiscovery.cpp | ||
399 | ping? |
Only one comment, but it does need to be addressed before committing. I'm surprised we don't hit this case anywhere in the LLD test suite. I guess we mostly have C test case inputs, not C++.
lld/COFF/PDB.cpp | ||
---|---|---|
365 | This would fail if it were an LF_MFUNC_ID |
This thing isn't really canonicalizing the symbol kind anymore, it's rewriting the symbol record. Maybe name it something like rewriteSymbolForPDB?