This is an archive of the discontinued LLVM Phabricator instance.

Minidump plugin: functions parsing memory structures and filtering module list
ClosedPublic

Authored by dvlahovski on Oct 13 2016, 10:32 AM.

Details

Summary

Now the Minidump parser can parse the:

  1. MemoryInfoList - containing region info about memory ranges (readable, writable, executable)
  2. Memory64List - this is the stuct used when the Minidump is a full-memory one.
  3. Adding filtering of the module list (shared libraries list) - there can be mutliple records in the module list under the same name but with different load address (e.g. when the binary has non contigious sections). GetFilteredModuleList eliminates the duplicated modules, leaving the one with the lowest load addr.

Added unit tests for everything.

Diff Detail

Repository
rL LLVM

Event Timeline

dvlahovski retitled this revision from to Minidump plugin: memory stuff and filtering module list.
dvlahovski updated this object.
dvlahovski added reviewers: labath, zturner.
dvlahovski added subscribers: amccarth, lldb-commits.

Forgot to run clang-format. Also changed a helper function in the tests to make it simpler.

dvlahovski retitled this revision from Minidump plugin: memory stuff and filtering module list to Minidump plugin: functions parsing memory structures and filtering module list.Oct 13 2016, 11:08 AM
labath edited edge metadata.Oct 14 2016, 4:31 AM

First round of comments from me :).

source/Plugins/Process/minidump/MinidumpParser.cpp
248 ↗(On Diff #74551)

If you use the emplace function, and then check the returned value, you will avoid doing two lookups into the table.

256 ↗(On Diff #74551)

const auto &, to avoid a copy. You can also call reserve() on the vector to avoid reallocations.

387 ↗(On Diff #74551)

these would probably fit on one line if you defined helper constants "yes" and "no" like you did in the tests.

source/Plugins/Process/minidump/MinidumpParser.h
79 ↗(On Diff #74551)

This would be easier to use if was declared like this:

... GetFilteredModuleList();

and it called GetModuleList under the hood. (Unless you actually anticipate needing to filter random module lists, in which case it should probably be static as it does not touch the local vars). And in any case, it should not take a reference to the ArrayRef.

85 ↗(On Diff #74551)

We use Do prefix is used for internal helper functions, when you cannot think of a better name. This functions is not that, so it should be just ReadMemory. That said, I don't think this function should be here at all.
An interface more consistent with the rest of the class would be:

llvm::ArrayRef<uint8_t> GetMemory(lldb::addr_t addr, size_t size);

It involves no copying (but the caller can still easily memcpy() it if he really needs to. As for the corner cases (callee asks for a region which is only partially available, you can define whichever behaviour makes more sense for your use case. If after this, you find that the function is too small to be useful, you can keep it as a private function inside the register context class.

87 ↗(On Diff #74551)
llvm::Optional<MemoryRegionInfo> GetMemoryRegionInfo(lldb::addr_t)

seems more consistent.

unittests/Process/minidump/MinidumpParserTest.cpp
144 ↗(On Diff #74551)

Why not expect the exact numbers here? They're not going to change between runs.

dvlahovski updated this revision to Diff 74694.Oct 14 2016, 8:31 AM
dvlahovski marked 7 inline comments as done.
dvlahovski edited edge metadata.

Resolved Pavel's remarks.

Also added a unit test for the GetMemory function.

zturner added inline comments.Oct 14 2016, 1:02 PM
source/Plugins/Process/minidump/MinidumpParser.cpp
242 ↗(On Diff #74694)

I don't know how big the minidumps you're working with are or if performance is a concern, but std::map<> is more or less a piece of junk. The only time it should really be used is if you need a deterministic iteration order. std::unordered_map<> is slightly better. Better still is llvm::StringMap<>

281 ↗(On Diff #74694)

data.empty()?

284 ↗(On Diff #74694)

!data.empty()?

291 ↗(On Diff #74694)

auto & to avoid a copy.

320 ↗(On Diff #74694)

auto &

source/Plugins/Process/minidump/MinidumpTypes.cpp
222 ↗(On Diff #74694)

I don't think you need the conditional here do you? Just data = data.drop_front(sizeof_MinidumpMemoryInfoListHeader).

source/Plugins/Process/minidump/MinidumpTypes.h
268 ↗(On Diff #74694)

Can you use llvm/ADT/BitmaskEnum.h here to give this enum bitwise operators?

274 ↗(On Diff #74694)

Same

294 ↗(On Diff #74694)

Same

labath added inline comments.Oct 17 2016, 2:26 AM
source/Plugins/Process/minidump/MinidumpParser.cpp
15 ↗(On Diff #74694)

I think this is not needed anymore.

252 ↗(On Diff #74694)

The reason I suggested emplace instead of insert was that you could strip one level of std::make_pair here.
I.e.

std::tie(iter, exists) = lowest_addr.emplace( module_name, std::make_pair(module.base_of_image, &module));
dvlahovski updated this revision to Diff 74824.Oct 17 2016, 4:08 AM
dvlahovski marked 11 inline comments as done.

Changed std::map with llvm::StringMap
Using containter.empty() instead of containter.size() == 0
Avoiding copy by using for(const auto& ...)
Added bitwise operations to the new enum classes

source/Plugins/Process/minidump/MinidumpTypes.cpp
222 ↗(On Diff #74694)

header->size_of_header is >= sizeof(MinidumpMemoryInfoListHeader) here. So, yes, I can simplify it like this: data = data.drop_front(header->size_of_header - sizeof(MinidumpMemoryInfoListHeader));

dvlahovski updated this object.Oct 17 2016, 4:11 AM

looks good as far as i am concerned

labath accepted this revision.Oct 18 2016, 3:57 AM
labath edited edge metadata.
This revision is now accepted and ready to land.Oct 18 2016, 3:57 AM
This revision was automatically updated to reflect the committed changes.