This is an archive of the discontinued LLVM Phabricator instance.

Fix use after free in PDB linker.
ClosedPublic

Authored by zturner on Feb 27 2018, 2:37 PM.

Details

Summary

When we load a type server PDB, we do so by creating an instance of NativeSession. This internally creates a BumpPtrAllocator which it owns. When we're remapping types, if we find a type record that doesn't contain any indices, we decide that since it doesn't need to be remapped, we can simply use the original buffer. This will be a buffer pointing into the BumpPtrAllocator's storage for the NativeSession in question.

Where this goes wrong is that when we're done with the type server PDB, we close it. This de-allocates all the memory that was owned by its BumpPtrAllocator, leading to some records containing freed data.

The fix here is to provide an overload which allows creating a NativeSession with a shared BumpPtrAllocator. This way we can use the single global allocator that is already used for the rest of the PDB linking operation.

This fixes https://bugs.llvm.org/show_bug.cgi?id=36455

Diff Detail

Repository
rL LLVM

Event Timeline

zturner created this revision.Feb 27 2018, 2:37 PM
rnk added a comment.Feb 27 2018, 2:49 PM

How many bytes does a PDB session allocator end up holding? Try the debug CRT PDB, since it's the most common one. I suspect it holds a ton of temporary data that we really should throw away after closing the type server PDB.

A different way of fixing this would be to disable the optimization that skips copying records without type indices.

Or, why are these non-remapped records even being copied into BumpPtrAllocator memory? Why don't they come directly from the PDB's MemoryBuffer? Can we extend the life of the memory mapped PDB instead? That memory is shared with the FS cache, and is much cheaper to keep around.

In D43834#1021297, @rnk wrote:

How many bytes does a PDB session allocator end up holding? Try the debug CRT PDB, since it's the most common one. I suspect it holds a ton of temporary data that we really should throw away after closing the type server PDB.

A different way of fixing this would be to disable the optimization that skips copying records without type indices.

Or, why are these non-remapped records even being copied into BumpPtrAllocator memory? Why don't they come directly from the PDB's MemoryBuffer? Can we extend the life of the memory mapped PDB instead? That memory is shared with the FS cache, and is much cheaper to keep around.

I suspect it is due to records which are discontiguous. Whenever we encounter one of those, we have to make a copy and piece it together so that it will be contiguous. I can confirm that the number of bytes is small though.

In D43834#1021297, @rnk wrote:

How many bytes does a PDB session allocator end up holding? Try the debug CRT PDB, since it's the most common one. I suspect it holds a ton of temporary data that we really should throw away after closing the type server PDB.

A different way of fixing this would be to disable the optimization that skips copying records without type indices.

Or, why are these non-remapped records even being copied into BumpPtrAllocator memory? Why don't they come directly from the PDB's MemoryBuffer? Can we extend the life of the memory mapped PDB instead? That memory is shared with the FS cache, and is much cheaper to keep around.

I suspect it is due to records which are discontiguous. Whenever we encounter one of those, we have to make a copy and piece it together so that it will be contiguous. I can confirm that the number of bytes is small though.

Yea, I confirmed it all comes from discontiguous records. Linking llvm-xray.exe, with the patch, the main PDB allocator consumes 115,717,444 bytes. Without the patch, the main allocator consumes 114,185,356 bytes and the type server allocator consumes 1,532,088 bytes. So this is around 1% of total memory usage. We can try to do funny things to lower the memory usage, but it seems simpler and less error prone to just do it this way, and the penalty is pretty small.

rnk accepted this revision.Feb 27 2018, 4:14 PM

Thanks, looks good!

This revision is now accepted and ready to land.Feb 27 2018, 4:14 PM
zturner added a comment.EditedFeb 27 2018, 4:17 PM

Actually now I'm second guessing myself. This fixes the case where we freed the BumpPtrAllocator used for discontiguous records, but when we close the NativeSession object, we also close the mapped view of the file, which invalidates any pointer to a record which wasn't discontiguous.

I think a better fix is to simply keep a strong reference to the type server PDB's session until we're done.

To clarify, this is only an issue when using type server PDBs, right? In other words, if we're purely cross-compiling using clang and lld-link, we wouldn't run into this, since clang-cl doesn't use type server PDBs (in other words, it does /Z7 instead of /Zi)?

To clarify, this is only an issue when using type server PDBs, right?

Yes

In other words, if we're purely cross-compiling using clang and lld-link, we wouldn't run into this, since clang-cl doesn't use type server PDBs (in other words, it does /Z7 instead of /Zi)?

No. clang-cl doesn't *generate* type server PDBs, so its' true that all debug info *generated by clang-cl* will not experience this. But you will need to link against the CRT, and that will pull in some variant of libcmt.pdb, and that is where you will hit this bug.

zturner updated this revision to Diff 136192.Feb 27 2018, 4:44 PM

Rewrite this patch to keep a strong reference to type server PDBs. This is much simpler than the previous solution, while also being more robust.

To clarify, this is only an issue when using type server PDBs, right?

Yes

In other words, if we're purely cross-compiling using clang and lld-link, we wouldn't run into this, since clang-cl doesn't use type server PDBs (in other words, it does /Z7 instead of /Zi)?

No. clang-cl doesn't *generate* type server PDBs, so its' true that all debug info *generated by clang-cl* will not experience this. But you will need to link against the CRT, and that will pull in some variant of libcmt.pdb, and that is where you will hit this bug.

We're linking against the dynamic CRT, and my VS 2017 install doesn't include PDBs for the dynamic CRT libraries (e.g. I see libcmt.pdb, but not anything for msvcrt). We were also not copying over those PDBs to our cross-compilation sysroot anyway, though that was just an oversight on our part and not anything intentional (but if they don't have PDBs for the dynamic CRT libraries, it's irrelevant for us anyway).

aganea accepted this revision.Feb 28 2018, 9:20 AM

Tested & verified on our use-case.

Some stats:

                    PDB servers linked-in [1] | Target PDB size | Mem used [2] | Mem w/o strong ref [3]
Project"A" -                  104             |      360 Mb     |    3.1 Gb    |      *crash*
Project"B" -                   70             |      1.8 Gb     |     18 Gb    |       17.8 Gb
Project"C" -                  112             |      865 Mb     |    6.2 Gb    |      *crash*

[1] LoadedPDBs.size()
[2] Memory used by LLD just before calling PDB.commit();
[3] w/o the line LoadedPDBs.push_back(std::move(*ExpectedSession)); :

This revision was automatically updated to reflect the committed changes.