Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
lldb/source/Target/TargetProperties.td | ||
---|---|---|
180 ↗ | (On Diff #336939) | What do you think about FetchReadonlyDataFromFileCache? @jasonmolenda: Does this description sound better? Desc<"Prefer reading read-only data from object files on disk. When debugging remotely this can be much faster than fetching memory from the target, but it might miss modifications performed by the operating system at load time.">; |
lldb/source/Target/Target.cpp | ||
---|---|---|
1760 | With https://lldb.llvm.org/cpp_reference/classlldb__private_1_1Flags.html |
Another question for @jasonmolenda: Do you happen to know if the existing prefer_file_cache mechanism applies relocations?
We can test it to make sure like
% cat >a.c char *globstr = "hello"; % clang -dynamiclib -o liba.dylib a.c % cat >b.c extern char *globstr; int main() { char *c = globstr; return c[0]; } % clang -g -o b.out b.c -L. -la % lldb b.out (lldb) disass -b -n main b.out`main: b.out[0x100003f90] <+0>: 55 pushq %rbp b.out[0x100003f91] <+1>: 48 89 e5 movq %rsp, %rbp b.out[0x100003f94] <+4>: 48 8b 05 65 00 00 00 movq 0x65(%rip), %rax ; (void *)0x0000000000000000 (lldb) br s -n main (lldb) r (lldb) disass -b -n main b.out`main: 0x100003f90 <+0>: 55 pushq %rbp 0x100003f91 <+1>: 48 89 e5 movq %rsp, %rbp 0x100003f94 <+4>: 48 8b 05 65 00 00 00 movq 0x65(%rip), %rax ; (void *)0x0000000100128000: globstr 0x100003f9b <+11>: c7 45 fc 00 00 00 00 movl $0x0, -0x4(%rbp) (lldb) ima loo -va 0x100003f9b+0x65 Address: b.out[0x0000000100004000] (b.out.__DATA_CONST.__got + 0) Summary: (void *)0x0000000100128000: globstr Module: file = "/tmp/b.out", arch = "x86_64" (lldb) x/gx 0x100003f9b+0x65 0x100004000: 0x0000000100128000 (lldb) tar mod dump sect b.out 0x00000003 data-ptrs [0x0000000100004000-0x0000000100004008) rw- 0x00004000 0x00000008 0x00000006 b.out.__DATA_CONST.__got
This __DATA,got section is marked as rw, so we read it from memory.
I'm not super convinced on the setting. It seems like something you'd add some people could work around a mistake we've made here.
I think we have two cases:
Someone calls Target::ReadMemory prefer_file_cache=true on a writable section. This is an easy mistake for people to make, they assume prefer_file_cache will only prefer the file cache on read-only Sections.
Someone reads Target::ReadMemory with prefer_file_cache=false on a read-only section. This is *often* a mistake, but there are times where we genuinely want to see raw memory.
Let's think about what the default behavior of Target::ReadMemory should be. The majority use case is that we prefer the file cache if it is a read-only section. It's a performance bug in almost every case to behave differently, I maintain. There are times when we want to show actual raw memory to people. Some naughty programs that manage to modify themselves, we need to see that real naughtiness, not show what the file had and pretend that is what is in memory.
I think Target::ReadMemory should have a force_live_memory argument, not a prefer_file_cache. It could be the final argument with a default value of force_live_memory==false. Almost everyone should let Target::ReadMemory check if this address is in a RO-section, and let it get the data from the file cache if so. If it's not in a RO section, or any section, read from live memory. And have the force_live_memory arg to override this.
This is a larger set of changes though, and I don't mean to sign anyone up for more than they intended. But, just thinking of this from a clean slate, I think that's the better design. Adrian, what do you think?
Hm, a little more thinking. Target::ReadMemory prefers a file cache if it is available. Process::ReadMemory only uses live memory. But Process::ReadMemory doesn't work when we have run lldb on a file and not run yet. e.g.
% lldb (lldb) tar cr -d /tmp/b.out Current executable set to '/tmp/b.out' (x86_64). (lldb) disass -n main b.out`main: b.out[0x100003f90] <+0>: pushq %rbp b.out[0x100003f91] <+1>: movq %rsp, %rbp b.out[0x100003f94] <+4>: movq 0x65(%rip), %rax ; (void *)0x0000000000000000 b.out[0x100003f9b] <+11>: movl $0x0, -0x4(%rbp) (lldb) ima loo -va 0x100003f9b+0x65 Address: b.out[0x0000000100004000] (b.out.__DATA_CONST.__got + 0) Summary: (void *)0x0000000000000000 Module: file = "/tmp/b.out", arch = "x86_64" (lldb) x/gx 0x100003f9b+0x65 0x100004000: 0x0000000000000000 (lldb)
I don't have a Process, so I must be using Target::ReadMemory here. and even though this section is read-write, we don't have any live memory so we're fetching it from the file. So that's an interesting extra wrinkle I didn't remember from Target::ReadMemory.
Trying to reword my thinking a little more clearly.
Process::ReadMemory - only works when you have a Process. Always reads directly from memory.
Target::ReadMemory - can be used with or without a Process.
- If force_live_memory == true, will read from memory if there is a Process. Else will read from file, if a Section covers this address.
- If the address is in a Section that is read-only, read from the file.
- If the address is in a section that is read-write, read from memory if there is a Process. Else read from the file, if a Section covers this address.
And I would make Target::ReadMemory's force_live_memory argument have a default value of false, because few people intend for this.
I'm open to hearing different opinions on this rethink! eg @clayborg @jingham
Let's think about what the default behavior of Target::ReadMemory should be. The majority use case is that we prefer the file cache if it is a read-only section. It's a performance bug in almost every case to behave differently, I maintain. There are times when we want to show actual raw memory to people. Some naughty programs that manage to modify themselves, we need to see that real naughtiness, not show what the file had and pretend that is what is in memory.
I think Target::ReadMemory should have a force_live_memory argument, not a prefer_file_cache. It could be the final argument with a default value of force_live_memory==false. Almost everyone should let Target::ReadMemory check if this address is in a RO-section, and let it get the data from the file cache if so. If it's not in a RO section, or any section, read from live memory. And have the force_live_memory arg to override this.
This is a larger set of changes though, and I don't mean to sign anyone up for more than they intended. But, just thinking of this from a clean slate, I think that's the better design. Adrian, what do you think?
Let's try to do the right thing here. I think your suggestions make a lot of sense.
Thanks for the input @jasonmolenda. FWIW I also think it's worth implementing this correctly.
Could you help me figure out which callers I should change to force_live_memory though?
Here's a list of everywhere ReadMemory is called:
SBTarget.cpp ReadInstructions ReadMemory CommandObjectMemory.cpp DoExecute Address.cpp ReadBytes Disassembler.cpp ParseInstructions Value.cpp GetValueAsData ValueObject.cpp GetPointeeData IRMemoryMap.cpp ReadMemory DynamicLoaderDarwinKernel.cpp ReadKextSummaries ReadKextSummaryHeader DynamicLoaderDarwin.cpp GetThreadLocalData EmulateInstructionMIPS.cpp SetInstruction UnwindAssemblyInstEmulation.cpp GetNonCallSiteUnwindPlanFromAssembly UnwindAssembly-x86.cpp AugmentUnwindPlanFromCallSite FirstNonPrologueInsn GetFastUnwindPlan GetNonCallSiteUnwindPlanFromAssembly Target.cpp ReadCStringFromMemory ReadScalarIntegerFromMemory
Maybe force live memory only for the functions in DynamicLoaderDarwinKernel.cpp and DynamicLoaderDarwin.cpp?
I should say, my suggestion of changing Target::ReadMemory is just an opinion, please do feel free to disagree!
I think if you look at the callers, I suspect all (or nearly all) of them are passing prefer_file_cache == true right now, which is the force_live_memory==false default case. Anyone passing prefer_file_cache==false would want to pass force_live_memory==true, and it may be that they are doing this incorrect (you were originally looking at an instance where that was true in swift)
An example where we WANT prefer_file_cache==false aka force_live_memory==true is in CommandObjectMemory.cpp. When people use the memory read command, we want to show them the actual memory contents [*], we don't want to optimize by getting data from a file.
- we still replace breakpoint instructions with the real instruction when reading instructions, iirc.
@jasonmolenda actually, most of them are passing prefer_file_cache = false, unfortunately. But they might be doing this because there's they weren't sure if the memory was writable, so this was the safer option. I want to change all the ones we deem safe to force_live_memory = false, which is why I ask which files you think this change would be safe.
Maybe force live memory only for the functions in DynamicLoaderDarwinKernel.cpp and DynamicLoaderDarwin.cpp?
Taking a quick look at the dynamic loader plugins, DynamicLoaderDarwin is calling ReadMemory in DynamicLoaderDarwin::GetThreadLocalData and I'm pretty sure the thread load storage address it is reading will be in a writable section, so our reimagined Target::ReadMemory will read from memory anyway.
DynamicLoaderDarwinKernel is calling Target::ReadMemory in ReadKextSummaryHeader and ReadKextSummaries -- it needs to read live memory, but
(lldb) ima loo -vs gLoadedKextSummaries 1 symbols match 'gLoadedKextSummaries' in /tmp/dl/mach.development.t8002: Address: mach.development.t8002[0x808c66b8] (mach.development.t8002.__DATA.__common + 394936) (lldb) tar mod dump sect 0x00000011 zero-fill [0x0000000080866000-0x00000000808c8294) rw- 0x00000000 0x00000000 0x00000001 mach.development.t8002.__DATA.__common
they're reading from a writable Section so we would do the right thing.
Changed the approach from a target setting to updating Target::ReadMemory to prefer the file cache (if section is read-only)
@jasonmolenda I've updated all the function calls to prefer the file cache (except the one you pointed out), but I'm a little worried about this, since I changed a lot of function calls. Maybe I could update the function signature to force_live_memory (and default it to false), but keep all the current calls having the same behavior (by negating whatever was passed previously as prefer_file_cache). What do you think?
I think that's a safe way to approach it. This will make it easier for new callers to do the right thing when they look at the method arguments, and we can go through and audit these more carefully by hand in the future.
clang-tidy: error: 'lldb/Core/Address.h' file not found [clang-diagnostic-error]
not useful