This is an archive of the discontinued LLVM Phabricator instance.

Fix the encoding and decoding of UniqueCStringMap<T> objects when saved to cache files.
ClosedPublic

Authored by clayborg on Apr 27 2022, 4:38 PM.

Details

Summary

UniqueCStringMap<T> objects are a std::vector<UniqueCStringMap::Entry> objects where the Entry object contains a ConstString + T. The values in the vector are sorted first by ConstString and then by the T value. ConstString objects are simply uniqued "const char *" values and when we compare we use the actual string pointer as the value we sort by. This caused a problem when we saved the symbol table name indexes and debug info indexes to disk in one process when they were sorted, and then loaded them into another process when decoding them from the cache files. Why? Because the order in which the ConstString objects were created are now completely different and the string pointers will no longer be sorted in the new process the cache was loaded into.

The unit tests created for the initial patch didn't catch the encoding and decoding issues of UniqueCStringMap<T> because they were happening in the same process and encoding and decoding would end up createing sorted UniqueCStringMap<T> objects due to the constant string pool being exactly the same.

This patch does the sort and also reserves the right amount of entries in the UniqueCStringMap::m_map prior to adding them all to avoid doing multiple allocations.

Added a unit test that loads an object file from yaml, and then I created a cache file for the original file and removed the cache file's signature mod time check since we will generate an object file from the YAML, and use that as the object file for the Symtab object. Then we load the cache data from the array of symtab cache bytes so that the ConstString "const char *" values will not match the current process, and verify we can lookup the 4 names from the object file in the symbol table.

Diff Detail

Event Timeline

clayborg created this revision.Apr 27 2022, 4:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 4:38 PM
clayborg requested review of this revision.Apr 27 2022, 4:38 PM
Herald added a project: Restricted Project. · View Herald TranscriptApr 27 2022, 4:38 PM
yinghuitan accepted this revision.Apr 27 2022, 10:32 PM

This is a great finding! The change looks good. Some questions though:

  • Do you have any theory why we only see this issue on Mac not Linux? (For anyone else reading this, I found this bug during testing Mac but the same reproduce steps work on Linux)
  • Do we use/have data structures in other part of caching feature need similar change?
This revision is now accepted and ready to land.Apr 27 2022, 10:32 PM
labath accepted this revision.Apr 28 2022, 12:37 AM

Also, to help diagnostics this kind of issue in future, it may worth to add an extra debug mode check in in UniqueCStringMap to ensure they are sorted. For example, in debug mode, add a m_isSorted flag which is set by UniqueCStringMap::Sort() method. All the binary search methods should assert m_isSorted == true.