This is an archive of the discontinued LLVM Phabricator instance.

Force iteration over FieldDecls and RecordDecls in deterministic order.
ClosedPublic

Authored by zturner on Mar 20 2015, 11:54 PM.

Details

Summary

Previously we were, in many places in the code, using a DenseMap<> to hold FieldDecls representing class members, and RecordDecls representing bases and virtual bases.

In some instances, we would try to copy records by iterating over their fields and bases and adding each one to a new AST. This yields a non-deterministic ordering of the results, and leads to invalid record layout. Specifically, clang expects field index order to be in offset order, and the field index order is determined by the order you add items to a record. So when synthesizing a new AST, you *must* add items in field offset order. Therefore, copying by iterating over a DenseMap of FieldDecls or RecordDecls for bases and inserting them this way yields invalid record layout.

This bug is present on all platforms and all ABIs, but was only surfaced on Windows with the MS ABI record layout because of extra checks and asserts present in MicrosoftRecordLayoutBuilder::layoutField. Also, this requirement of importing in a deterministic order applies to bases and virtual bases too, not just fields.

The fix employed here is to change all occurrences of DenseMap<A,B> as it pertains to ASTs to std::vector<std::pair<A, B>>. There were only a very few instances of anyone using the DenseMaps for lookup, and given the typical number of fields a class has, it seems like a reasonable tradeoff to ensure that this kind of problem can never happen again. A std::vector<> should also require less memory than a DenseMap. I'm not sure if there are tons of SymbolFileDWARF::LayoutInfos lying around in memory, but if there are this could have an impact on memory footprint too.

Since this problem was present everywhere it's unclear how -- if at all -- it was manifesting or what types of issues this might fix for everyone else. The problem may have been masked if ASLR was disabled or depending on what allocator was used. On Windows it definitely fixes a whole slew of test failures. I know Linux has been seeing some non-deterministic failures as well, so one of the Linux people shoudl see if this patch makes a difference.

Diff Detail

Repository
rL LLVM

Event Timeline

zturner updated this revision to Diff 22407.Mar 20 2015, 11:54 PM
zturner retitled this revision from to Force iteration over FieldDecls and RecordDecls in deterministic order..
zturner updated this object.
zturner edited the test plan for this revision. (Show Details)
zturner added a subscriber: Unknown Object (MLST).
zturner added inline comments.Mar 21 2015, 12:00 AM
source/Expression/ClangASTSource.cpp
1543 ↗(On Diff #22407)

Since I didn't call it out specifically in the CL, this is the line that was introducing the non-determinism when a DenseMap is used.

It would be possible to fix this by only copying the elements of the DenseMap<> out to a temporary list and sorting the list, but since iterating in a non-deterministic order is perilous, I will propose this change first.

amccarth edited edge metadata.Mar 23 2015, 8:03 AM

Nice find.

source/Expression/ClangASTSource.cpp
1543 ↗(On Diff #22407)

Nit: I assume fi was and abbreviation for "field iterator". Since this is no longer and iterator and just (a reference to) the field, perhaps rename it.

rnk added a subscriber: rnk.Mar 23 2015, 10:33 AM

Nice! If you actually need fast lookup, you can use llvm::MapVector, but based on your analysis it looks like std::vector is more appropriate.

ovyalov edited edge metadata.Mar 23 2015, 2:28 PM

Do we need to use operator[] on vector? I'm wondering whether list can be used instead of vector.

source/Expression/ClangASTSource.cpp
1541 ↗(On Diff #22407)

Minor thing - destination_list.reserve(source_list.size())

source/Plugins/SymbolFile/DWARF/SymbolFileDWARF.cpp
2675 ↗(On Diff #22407)

Could you use cbegin and replace such a long type name with sth like
auto pos = layout_info.field_offsets.cbegin(), end = layout_info.field_offsets.cend();

zturner updated this revision to Diff 22535.Mar 23 2015, 5:59 PM
zturner edited edge metadata.

Fixed some of the issues that were pointed out. Will try to upload this tomorrow if there's no further comments.

This revision was automatically updated to reflect the committed changes.

Adding Sean as a reviewer now that he is back.

spyffe edited edge metadata.Mar 24 2015, 10:08 AM

The abstract interface for ExternalASTSource has a layoutRecordType() function (llvm/tools/clang/include/clang/AST/ExternalASTSource.h):

virtual bool layoutRecordType(
    const RecordDecl *Record, uint64_t &Size, uint64_t &Alignment,
    llvm::DenseMap<const FieldDecl *, uint64_t> &FieldOffsets,
    llvm::DenseMap<const CXXRecordDecl *, CharUnits> &BaseOffsets,
    llvm::DenseMap<const CXXRecordDecl *, CharUnits> &VirtualBaseOffsets);

We're conforming to that. I'm concerned that by changing the signature of layoutRecordType() in LLDB, you're causing our version to not be called at all. Unless I'm missing something, this doesn't look like the right fix.

As an aside, when laying out a RecordDecl, you have the ordering for its fields right in the RecordDecl. If MicrosoftRecordLayoutBuilder relying on the llvm::DenseMap for ordering, I think it's MicrosoftRecordLayoutBuilder that has the bug. By my understanding of how record layout works, it should only be using the map to go from fields/bases to offsets – nothing else.

This bug isn't specific to Microsoft layout. Clang as a general rule
requires fields to be added in offset order, and iterating over a dense map
doesn't give you that.

I'll let majnemer or rnk comment on the ExternalASTSource interface

Also the layout builder never gets the dense map. So Microsoft Layout
Builder doesn't know anything about the dense map. It's ClangASTSource that
iterates over the map, and all the layout builder sees is fields being
added in the wrong order

I see what you mean about the interface. I guess that settles that then,
the interface needs to be consistent. I'll add override keywords so the
compiler can catch that for me in the future.

But our implementation still needs to add fields in the correct order, so
iterating over the DenseMap is still not the right approach

I talked to some of the clang guys here and they said to go back to
DenseMaps to conform to the ExternalASTSource interface, but make sure you
iterate over fields and bases in offset order, which means sorting the list
before doing the insertion. So I will submit a patch to do that today.