Add a dynamic loader plug-in class for WebAssembly modules.
Depends on D72650.
Differential D72751
[LLDB] Add DynamicLoaderWasmDYLD plugin for WebAssembly debugging paolosev on Jan 14 2020, 11:33 PM. Authored by
Details
Diff Detail
Event TimelineComment Actions Testing dynamic loaders is a bit tricky as they require an actual process around. The best thing available to us right now is the "gdb-client" approach, which consists of mocking the responses of the gdb server. It's not the easiest way to write tests, but I don't think it should be that difficult in this case -- you shouldn't need to mock that many packets -- the main one is qXfer:libraries. Then you should be able to run something like "image lookup -a" (or SBTarget::ResolveLoadAddress, if you want to try your hand at the scripting API) and check that it resolves to the correct section+offset pair. You can look at the existing tests in packages/Python/lldbsuite/test/functionalities/gdb_remote_client/ to see how this works... I will have some more questions about the interaction of this function with ObjectFileWasm::SetLoadAddress, but I need to think this over a bit...
Comment Actions I have verified the logic of the dynamic loader quite carefully, but there are a couple of things to clarify. A Wasm module is loaded at a 64 bit address, where the upper 32 bits are used as module identifier. Let’s say that we have a module with Id==4, so it will be loaded at address 0x00000004`00000000. Each section is loaded at its relative file offset. Therefore if the code section starts at file offset 0x4d in the Wasm module, we call: The module can also contain embedded DWARF sections, which will also be loaded at their relative file offset in the same way. And since there cannot be duplicated sections in a module, there is no overlapping, we can always convert a load address back into a section. However, there are two complications. The first is that we need to call Target::SetSectionLoadAddress() twice, from two different places. First we need to call Target::SetSectionLoadAddress() in ObjectFileWasm::SetLoadAddress(), and then again in DynamicLoaderWasmDYLD::DidAttach(). The reason for this seems to originate in the sequence of function calls: In DynamicLoaderWasmDYLD::DidAttach() we call ProcessGDBRemote::LoadModules() to get list of loaded modules from the remote (Wasm engine).
then:
So, at the end of LoadModules() in DynamicLoaderWasmDYLD::DidAttach() the SectionLoadList is empty, and we need to set it again by calling Target::.SetSectionLoadAddress() again. The second problem is that the Code Section needs to be initialized (in ObjectFileWasm::CreateSections()) with m_file_addr = m_file_offset = 0, and not with the actual file offset of the Code section in the Wasm file. If we set Section::m_file_addr and Section::m_file_offset to the actual code offset, the DWARF info does not work correctly. I have some doubts regarding the DWARF data generated by Clang for a Wasm target. Looking at an example, for a Wasm module that has the Code section at offset 0x57, I see this DWARF data: 0x0000000b: DW_TAG_compile_unit […] DW_AT_low_pc (0x0000000000000000) DW_AT_ranges (0x00000000 [0x00000002, 0x0000000e) [0x0000000f, 0x0000001a) [0x0000001b, 0x00000099) [0x0000009b, 0x0000011c)) The documentation says that “Wherever a code address is used in DWARF for WebAssembly, it must be the offset of an instruction relative within the Code section of the WebAssembly file.” Comment Actions [ looping in @aadsm for the svr4 stuff ] Thanks for adding the test, and for the detailed writeup. Please find my comments inline. I hope so. :) This seems like a bug in ProcessGDBRemote::LoadModules. It seems wrong/wasteful/etc to do all this work to compute the section load addresses only to have them be thrown away by SetExecutableModule. Maybe all it would take is to reverse the order of these two actions, so that the load addresses persist? Can you try something like that? On a side note, ProcessGDBRemote::LoadModules seems a bit internally inconsistent. At one place it claims that "The main executable will never be included in libraries-svr4", but then it goes on to set an executable module anyway. This could in fact be a clue as to why this problem hasn't showed up on other platforms -- if the remote does not send the executable, then SetExecutableModule is not called. On a side-side note, this may mean that you sending the main wasm file through qXfer:libraries-svr4 may not be correct. However, fixing that would mean finding another way to communicate the main executable name/address. I don't think we currently have an easy way to do that so it may be better to fix ProcessGDBRemote::LoadModules, given that it "almost" supports executables. I am also worried about the fact that SymbolFileDWARF::CalculateAbilities requires the module to be "loaded". That shouldn't be normally required. That function does a very basic check on some sections, and this should work fine without those sections having a "load address", even if they are actually being loaded from target memory. I think this means there are some additional places where ObjectFileWasm should use m_memory_addr instead of something else...
That's interesting. I don't think that clang is really wrong/non-conforming here, but this choice plays a rather poorly with the way lldb handles object files (and how your remote presents them). In this case special casing the code section may be fine, but you should be aware that there are places in lldb which expect that the "load bias" (the delta between file and load addresses) is the same for all sections in a module (because that's how it works elsewhere), and they may not like this. However, if you have only one code section, I think you should be mostly fine. I do think that this can be handled in a better way (I'm pretty sure you don't need to zero out the file offset for instance), but we can wait with that until we resolve the first point... Comment Actions Thanks for the explanation! I wasn't quite clear on "executable module" here, but after your comments I realized that Target::SetExecutableModule() should not probably be called also for Wasm modules. Comment Actions Yeah, I'm not sure why the LoadModules function is calling target.SetExecutableModule. It is true that the libraries-svr4 will not include the main executable in its list. Comment Actions I am fine that. I don't really know enough about wasm to say if you should have something like the "main" module, but I don't think it should make a big difference to lldb anyway. And another benefit to that is that we can say we stick to the qXfer "spec" and only send the shared libraries over.
Yes, let's do that. Can you check what happens if you just move the file_offset = sect_info.offset & 0xffffffff; line in ObjectFileWasm outside of the if(!code) block? My guess is that you'll just need to replace some section->GetFileOffset() calls with ->GetFileAddress(). I'm pretty sure those calls will be only in wasm code because other object formats don't have sections at file offset zero, and everything is fine with that.
Comment Actions Thanks for digging this up, Antonio. My impression of that thread is that the author did not fully understand what Greg was asking him to do, and then that discussion got buried in other stuff. Given that qXfer is not supposed to send shared libraries, wasm is not going to be using it, and it doesn't look like this code could ever work, I'm tempted to just remove this SetExecutable block. :) Comment Actions Modified to set m_file_offset to be the correct offset of the Code section. This also simplifies the code in ObjectFileWasm to avoid a special case for the code section in ObjectFileWasm::SetLoadAddress. Comment Actions Thanks. I am glad that we were able to sort that out. Now that there's nothing wasm-specific in DynamicLoaderWasmDYLD::LoadModuleAtAddress, I have another question. :) The code in that function is a subset of the base DynamicLoader::LoadModuleAtAddress method you are overriding. Is there a reason for that? What do you think about that? Comment Actions I would be interested to see if this helps as well.
The logic in ProcessGDBRemote::LoadModules is bad, the executable should be set before any sections are loaded.
Is there a "qXfer:executable"? Seems from the code in ProcessGDBRemote::LoadModules() that people were seeing the main executable in the list somewhere. Be a shame to not allow it.
So a lot of these problems might go away if the modify the ObjectFileWASM to "do the right thing" when the ObjectFile is from a live process. Object file instances know if there are from a live process because the ObjectFile members: lldb::ProcessWP m_process_wp; const lldb::addr_t m_memory_addr; Will be filled in. So the object file doesn't need to have its sections loaded in a target in order to read section contents right? The object file can be asked to read its section data via ObjectFile::ReadSectionData(...) (2 variants). So Pavel is correct, the sections don't need to be loaded for anything in SymbolFileDWARF::CalculateAbilities(...). The section list does need to be created and available during SymbolFileDWARF::CalculateAbilities(...). We just need to be able to get the section contents and poke around at the DWARF bits.
So the DWARF plug-ins definitely expect the addresses in the DWARF to be file addresses where if you ask the module for its section list, it can look up this "file address" in the list, it will find a single section. If this is not the case, you will need to make a special SymbolFileDWARFWasm subclass and this will be very tricky as any address you get from the DWARF will need to be converted so that the lookup the section list will work. Comment Actions You are right! At this point DynamicLoaderWasmDYLD::LoadModuleAtAddress does not do anything specific to Wasm and there is no reason to override DynamicLoader::LoadModuleAtAddress. We can just call that base function, which should also work when the Wasm module is in the local filesystem. Comment Actions Comments inline...
The problem went away with the realization that ObjectFileWasm type should be eTypeSharedLibrary, not eTypeExecutable.
Yes, this seems to be the case with the current implementation of ObjectFileWASM. It creates the section list in ObjectFileWasm::SetLoadAddress which calls Target::SetSectionLoadAddress but the sections don't need to be fully loaded, and during SymbolFileDWARF::CalculateAbilities(...) ObjectFile::ReadSectionData is called to load the necessary data.
File addresses can uniquely identify a single section, there is no problem with this, and there is always a single code section per module. The only "weirdness" is that since DWARF code addresses for Wasm are calculated from the beginning of the Code section, not the beginning of the file, for the Code section, Section::m_file_offset can normally be the file offset, but Section::m_file_addr needs to be zero. This seems to make all DWARF-related code work, but, as Pavel said, maybe there could be places where LLDB expects the "load bias" to be the same for each section, which could cause problems? Comment Actions Thanks. My hopefully final question is not really for you but more like for other lldb developers (@jingham, @clayborg, etc.). Given that this plugin is now consisting of boiler plate only, I am wondering if we should not instead make it possible for this use case to work without any special plugins needed. A couple of options that come to mind are:
WDYT? This is correct, but I want to point out that the "load" in SetLoadAddress and in ReadSectionData have two very different meanings. The first one records the address of a section in the process memory, while the second one "load" the contents of a section into lldb memory (from whereever). The second one should work regardless of whether the first one was called. This is why you are able to inspect the debug info of an executable before actually running it.
The basic section loading machinery can handle sections which are "shuffled" around, but this is not true of everything (because this is not how typical object file formats work). Given that you only have one code section (no debug info or symbols should point into the debug sections) I think you should be mostly fine. In fact it would be possible to organize things such that the "load bias" is a constant, if we create an additional pseudo-section for the file header (like we do for COFF) with a negative file address. The layout would them look something like this /------------\ | header | file_addr = -sizeof(header) |------------| | code | file_addr = 0 |------------| | debug_info | file_addr = offsetof(debug_info) - sizeof(header) \------------/ This would keep the code section at address zero, and after applying a load bias of module_id | sizeof(header), everything would land in the right place. The reason I haven't proposed that is because that gets a bit messy, and so it seems acceptable to just do what you do now, provided it ends up working. Comment Actions I am fine with 1 as long as we document the DynamicLoader class to say that it will call Process::LoadModules() and will be used if no specialized loader is needed for your platform. I would like to a see a solution that will work for any process plug-in and not just ProcessGDBRemote. If we change solution 3 above to say "Make lldb_private::Process call LoadModules() itself if no dynamic loader instance is available" then solution 3 is also fine.
Yes the current approach allows anyone to load any section at any address. On Darwin systems, the DYLD shared cache will move TEXT, DATA, and other sections around such that all TEXT sections from all shared libraries in the shared cache are all in the one contiguous range. The slide is different for each section, so we have some nice flexibility with being able to set the section load address individually. They will even invert the memory order sometimes where in the file we have TEXT followed by DATA, but in the shared cache DATA appears at a lower address than __TEXT. We currently don't have the ability to load the same section at multiple addresses. This can happen when a shared library is loaded multiple times in memory, which we have seen on Android where a vendor will have a file that is the same as the base system, and the same exact file in loaded, albeit from different paths. Comment Actions Regarding:
there is a problem: if I remove DynamicLoaderWasmDYLD what happens is that DynamicLoaderStatic is found as a valid loader for a triple like "wasm32-unknown-unknown-wasm" because the Triple::OS is llvm::Triple::UnknownOS (I found out the hard way when I was registering DynamicLoaderWasmDYLD after DynamicLoaderStatic :-)). DynamicLoader *DynamicLoaderStatic::CreateInstance(Process *process, bool force) { bool create = force; if (!create) { const llvm::Triple &triple_ref = process->GetTarget().GetArchitecture().GetTriple(); const llvm::Triple::OSType os_type = triple_ref.getOS(); if ((os_type == llvm::Triple::UnknownOS)) create = true; } ... call stack: DynamicLoaderStatic::CreateInstance(lldb_private::Process * process, bool force) Line 29 DynamicLoader::FindPlugin(lldb_private::Process * process, const char * plugin_name) Line 52 lldb_private::process_gdb_remote::ProcessGDBRemote::GetDynamicLoader() Line 3993 lldb_private::Process::CompleteAttach() Line 2931 lldb_private::Process::ConnectRemote(lldb_private::Stream * strm, llvm::StringRef remote_url) Line 3022 Could ProcessGDBRemote::GetDynamicLoader behave differently just when the architecture is wasm32, maybe? But then it is probably cleaner to add this plugin class, what do you think? Comment Actions Sorry about the off-topic, but I found this bit very interesting. Greg, how does this work with code referencing the variables in the data section (e.g. static int x; int *f() { return &x; }). This code on elf, even when linked in fully position-independent mode will not contain any relocations because it is assumed that the dynamic loader will not change the relative layout of code and data (and so the code can use pc-relative addressing). This obviously does not work if data can be moved around independently. Does that mean that darwin will include some additional relocations which need to be resolved at load time? Comment Actions Well.. I think DynamicLoaderStatic is being too grabby. :) However, when I though about that idea further, I realized that any default plugin we could implement this way could not be complete, as we would be missing the part which (un)loads modules when a new shared library (dis)appears. This is something that cannot be done in a generic way, as that usually requires setting breakpoint on some special symbol, or getting notifications about shared library events in some other way. I don't know whether this is something that you will also need/plan to implement for wasm, or if one can assume that all modules are loaded from the get-go, but this is what convinced me that putting this in a separate plugin is fine. Comment Actions lgtm, per the previous comment. A couple of additional inline comments in the test.
Comment Actions No, the relocations are performed when the shared cache is made and they are removed! This also works for PLT calls from one shared library to another. If two shared libraries have PLT entries to each other's functions, those are resolved and don't require a call to the dynamic loader! There are two shared caches: one for running only and one for development. The development shared caches leave PLT entries alone so that interposing can happen. Comment Actions
So if you really have no OS, and no vendor, then this plug-in does get used. Why? Because it assumes that things will be loaded where they are (file_addr == load_addr). Could you create your WASM stuff in a way such that all addresses that are in a memory loaded object file have the file address the same as the load address? Do you ever actually have files on disk that we load for WASM? Or is it always loaded from memory? It is ok to add your plug-in that recognizes WASM with no OS and no vendor, you will just need to make sure that the plug-in is registered before DynamicLoaderStatic. Since the plug-in manager will run through each plug-in linearly when looking for a match. This is also why any "auto registration of plug-ins" will cause problems as if we don't control the plug-in registration order, we will have issues that will arise. That is a different topic though, but I thought I remembered seeing a patch that tried to do this. Comment Actions Fixed the tests as suggested, and also added a couple more tests, to cover both the case where the Wasm module is loaded from memory and the case where it is loaded from a file.
Yes, we'll need to support the case where a Wasm module is loaded at runtime, I think ProcessGDBRemote already does most of the work when it receives a stop event that contains the "library" flag and calls LoadModules() again. But I need to test this carefully. Comment Actions It looks like the wasm-DYLD directory is missing. I removed it again from the CMake file but now it's failing to build. Can you please take a look? Comment Actions I've partially reverted your change and XFAILed the tests in commit 4697e701b8cb40429818609814c7422e49b2ee07 (HEAD -> master, origin/master) Author: Jonas Devlieghere <jonas@devlieghere.com> Date: Wed Feb 5 15:30:11 2020 -0800 Partially revert "[LLDB] Add DynamicLoaderWasmDYLD plugin for WebAssembly debugging" This temporarily and partially reverts 3ec28da6d643 because it's missing a directory. Comment Actions There is Windows Build Bot failure http://lab.llvm.org:8011/builders/lldb-x64-windows-ninja/builds/13427. Can you please fix or revert it? Cannot open include file: 'Plugins/DynamicLoader/wasm-DYLD/DynamicLoaderWasmDYLD.h': No such file or directory Comment Actions I am afraid this isn't over yet. :( The tests with this patch don't seem to be compatible with python3 and are failing due to various errors: http://lab.llvm.org:8011/builders/lldb-x86_64-debian/builds/4419/steps/test/logs/stdio. Even when running the test with python2 I have had one failure due to the assert on TestWasm.py:233. I am not sure how this managed to pass for you (I guess we must have some host differences leaking in here), but overall, I think this is an overly aggressive assertion. You can't assume that lldb will read absolutely no memory, even when the module is read from disk. Maybe you could just check that none of the reads overlap the debug_info section? (I am deliberately not including the text section here, because lldb prefers to read code from memory instead of from object file). Since the situation on master was starting to get a bit out of hand (multiple concurrent breakages with frantic fixup attempts), I've tried to back everything out so we can start with a clean slate again. I am also sorry for not looking over the latest round of changes. I'll check out the changes you've since I've approved the patch tomorrow. Comment Actions This should have been fixed by https://reviews.llvm.org/rGf5f70d1c8fbf12249b4b9598f10a10f12d4db029. Comment Actions As promised, here are the comments on the new tests. I think that most of the py3 incompatibilities will go away once we get rid of the yaml preprocessing step, but it would be good to verify this with python 3 nonetheless...
Comment Actions Thank you! Removing the yaml preprocessing indeed simplify everything, now tests should work both with Python 2 and 3. Comment Actions Ok, let's give this one more try. I have a couple of inline comments for the further simplification of the test case.
Comment Actions Thank you again, I fixed the tests. If it's ok for you, I will then merge this patch (I got commit access now :)). Comment Actions Fix patch after rebasing:
Comment Actions Hi @labath, can I ask you the favor to land this patch for me? I have rebased it because the logic of SystemInitializers have changed with new macros. I asked commit access and I should have obtained it, but when I try to land this patch either with 'arc land' or with 'git push' I get a permission error: git push --dry-run Password for 'https://paolosevMSFT@github.com': remote: Permission to llvm/llvm-project.git denied to paolosevMSFT. fatal: unable to access 'https://paolosevMSFT@github.com/llvm/llvm-project.git/': The requested URL returned error: 403 Maybe there is something I have not understood in the process... Comment Actions Committed as c1121908aace019b3e31e24def58a21a978531cd.
"arc land" never worked. Don't even try it -- all it can do is mess up your local repo. :) The "git push" error looks like a generic problem with authenticating to github. I'd try creating a dummy personal repo and seeing if you can push there first. Switching to a different transport protocol (git@github.com + authentication via ssh keys) is also worth a shot.
Comment Actions @JDevlieghere added inline comments:
This was discussed in one of the comments above. If DynamicLoaderStatic preceeds DynamicLoaderWasmDYLD then it is recognized as a valid loader for a triple like "wasm32-unknown-unknown-wasm" because the Triple::OS is llvm::Triple::UnknownOS. Comment Actions I think it depends on whether the DynamicLoaderStatic should be a fallback. If it doesn't make sense then yes, I think we should reject that triple there. Comment Actions
It should not be a fallback. Ok! I'll create a new patch with this change. Comment Actions I think we should just have DynamicLoaderStatic disqualify itself for wasm files -- using it will never work there, so why should it pretend to support them... Comment Actions The triple clearly has an environment set of "wasm" so it should be easy to detect this in the DynamicLoaderStatic and stop it from saying it can handle it. The main issue then is do we have DynamicLoaderStatic say it can't handle anything if any environment is set, or just not for "wasm" (if OS and vendor are not set either). Comment Actions "Wasm" is an "architecture" (and an object file format), not an environment, but yeah, I think it should just check for wasm specifically for now. If we see a pattern developing later, we can change that... Comment Actions If we switch to auto registration of plug-in in the near future there are many things we need to watch out for and some notion of ordering needs to happen. A few examples:
, we might need to bolster the plug-in system a bit to be more like the SymbolFile plug-ins. For symbol file plug-ins we have each on calculate abilities and say Comment Actions I went through the DYLD plugins and only the WASM and Hexagon plugin check the ArchType rather than the OSType, so I've created a patch to explicitly reject those in the DynamicLoaderStatic plugin: https://reviews.llvm.org/D74780 Comment Actions Hi @paolosev, the lldb sanitized bot is flagging a container-overflow error here. I know that this /can/ have FPs when sanitized and unsanitized code is mixed, but we should be in purely sanitized code here, and this looks like a valid report. PTAL. ================================================================= ==11283==ERROR: AddressSanitizer: container-overflow on address 0x615000016184 at pc 0x00010b4608f0 bp 0x7ffee4f00130 sp 0x7ffee4eff8f8 READ of size 512 at 0x615000016184 thread T0 #0 0x10b4608ef in __asan_memcpy+0x1af (libclang_rt.asan_osx_dynamic.dylib:x86_64+0x418ef) #1 0x1116486d5 in lldb_private::MemoryCache::Read(unsigned long long, void*, unsigned long, lldb_private::Status&) Memory.cpp:189 #2 0x11119d0e9 in lldb_private::Module::GetMemoryObjectFile(std::__1::shared_ptr<lldb_private::Process> const&, unsigned long long, lldb_private::Status&, unsigned long) Module.cpp:298 #3 0x11169eeef in lldb_private::Process::ReadModuleFromMemory(lldb_private::FileSpec const&, unsigned long long, unsigned long) Process.cpp:2402 #4 0x11113337b in lldb_private::DynamicLoader::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long long, unsigned long long, bool) DynamicLoader.cpp:212 #5 0x111ed53da in lldb_private::process_gdb_remote::ProcessGDBRemote::LoadModuleAtAddress(lldb_private::FileSpec const&, unsigned long long, unsigned long long, bool) ProcessGDBRemote.cpp:4767 #6 0x111ed59b8 in lldb_private::process_gdb_remote::ProcessGDBRemote::LoadModules() ProcessGDBRemote.cpp:4801 #7 0x1119c59aa in lldb_private::wasm::DynamicLoaderWasmDYLD::DidAttach() DynamicLoaderWasmDYLD.cpp:63 #8 0x1116a3a97 in lldb_private::Process::CompleteAttach() Process.cpp:2930 #9 0x1116a6bdf in lldb_private::Process::ConnectRemote(lldb_private::Stream*, llvm::StringRef) Process.cpp:3015 #10 0x110a362ee in lldb::SBTarget::ConnectRemote(lldb::SBListener&, char const*, char const*, lldb::SBError&) SBTarget.cpp:559 Comment Actions
I think I might have found a problem with the existing code to cache memory read from a remote process. Looking at: the error is easily explained. From ProcessGDBRemote::LoadModules we read the initial chunk (512 bytes) of an object file: _lldb.pyd!lldb_private::MemoryCache::Read(unsigned __int64 addr, void * dst, unsigned __int64 dst_len, lldb_private::Status & error) Line 239 C++ _lldb.pyd!lldb_private::Process::ReadMemory(unsigned __int64 addr, void * buf, unsigned __int64 size, lldb_private::Status & error) Line 1953 C++ _lldb.pyd!lldb_private::Module::GetMemoryObjectFile(const std::shared_ptr<lldb_private::Process> & process_sp, unsigned __int64 header_addr, lldb_private::Status & error, unsigned __int64 size_to_read) Line 300 C++ _lldb.pyd!lldb_private::Process::ReadModuleFromMemory(const lldb_private::FileSpec & file_spec, unsigned __int64 header_addr, unsigned __int64 size_to_read) Line 2402 C++ _lldb.pyd!lldb_private::DynamicLoader::LoadModuleAtAddress(const lldb_private::FileSpec & file, unsigned __int64 link_map_addr, unsigned __int64 base_addr, bool base_addr_is_offset) Line 212 C++ _lldb.pyd!lldb_private::process_gdb_remote::ProcessGDBRemote::LoadModuleAtAddress(const lldb_private::FileSpec & file, unsigned __int64 link_map, unsigned __int64 base_addr, bool value_is_offset) Line 4769 C++ _lldb.pyd!lldb_private::process_gdb_remote::ProcessGDBRemote::LoadModules() Line 4803 C++ In MemoryCache::Read, since this data is not cached yet, we call m_process.ReadMemoryFromInferior to actually read the memory (lines 174-241), look at the bottom of: size_t MemoryCache::Read(addr_t addr, void *dst, size_t dst_len, Status &error) { [...] while (bytes_left > 0) { [...] BlockMap::const_iterator pos = m_L2_cache.find(curr_addr); BlockMap::const_iterator end = m_L2_cache.end(); if (pos != end) { size_t curr_read_size = cache_line_byte_size - cache_offset; if (curr_read_size > bytes_left) curr_read_size = bytes_left; memcpy(dst_buf + dst_len - bytes_left, pos->second->GetBytes() + cache_offset, curr_read_size); [...] } // We need to read from the process if (bytes_left > 0) { assert((curr_addr % cache_line_byte_size) == 0); std::unique_ptr<DataBufferHeap> data_buffer_heap_up( new DataBufferHeap(cache_line_byte_size, 0)); size_t process_bytes_read = m_process.ReadMemoryFromInferior( curr_addr, data_buffer_heap_up->GetBytes(), data_buffer_heap_up->GetByteSize(), error); if (process_bytes_read == 0) return dst_len - bytes_left; if (process_bytes_read != cache_line_byte_size) data_buffer_heap_up->SetByteSize(process_bytes_read); m_L2_cache[curr_addr] = DataBufferSP(data_buffer_heap_up.release()); // We have read data and put it into the cache, continue through the // loop again to get the data out of the cache... } First, we allocate a DataBufferHeap with the size of our cache_line_byte_size, 512 bytes, and we pass it to ReadMemoryFromInferior(). This should be very simple to fix but the simple fix of just reading the available bytes doesn't work: Module::GetMemoryObjectFile expects to always be able to read by default 512 bytes, and it fails if the object file is smaller: ObjectFile *Module::GetMemoryObjectFile(const lldb::ProcessSP &process_sp, lldb::addr_t header_addr, Status &error, size_t size_to_read) { [...] const size_t bytes_read = process_sp->ReadMemory(header_addr, data_up->GetBytes(), data_up->GetByteSize(), readmem_error); if (bytes_read == size_to_read) { [...] // ok... } else { error.SetErrorStringWithFormat("unable to read header from memory: %s", readmem_error.AsCString()); } [...] So probably, in order not to change the client code too much we should always read size_to_read bytes and pad the array with zeros if the file is smaller? it's getting a bit late today but I should have a patch early tomorrow. Comment Actions
It sounds like the fix will be fairly straightforward. I think anyone on this thread should be able to review that. |
this should be available as lldb.LLDB_INVALID_ADDRESS