This is an archive of the discontinued LLVM Phabricator instance.

Add a setting that enables memory to be read from the file cache instead of process when the section LLDB is reading from is read-only
ClosedPublic

Authored by augusto2112 on Apr 12 2021, 12:55 PM.

Diff Detail

Event Timeline

augusto2112 requested review of this revision.Apr 12 2021, 12:55 PM
augusto2112 created this revision.
Herald added a project: Restricted Project. · View Herald TranscriptApr 12 2021, 12:55 PM
aprantl added inline comments.Apr 12 2021, 1:17 PM
lldb/source/Target/TargetProperties.td
180 ↗(On Diff #336939)

What do you think about FetchReadonlyDataFromFileCache?

@jasonmolenda: Does this description sound better?

Desc<"Prefer reading read-only data from object files on disk. When debugging remotely this can be much faster than fetching memory from the target, but it might miss modifications performed by the operating system at load time.">;
aprantl added inline comments.Apr 12 2021, 1:23 PM
lldb/source/Target/Target.cpp
1760

With https://lldb.llvm.org/cpp_reference/classlldb__private_1_1Flags.html
you can write this as Flags(permissions).Test(ePermissionsWritable)

Another question for @jasonmolenda: Do you happen to know if the existing prefer_file_cache mechanism applies relocations?

Another question for @jasonmolenda: Do you happen to know if the existing prefer_file_cache mechanism applies relocations?

We can test it to make sure like

% cat >a.c
char *globstr = "hello";
% clang -dynamiclib -o liba.dylib a.c
% cat >b.c
extern char *globstr;
int main() { char *c = globstr; return c[0]; }
% clang -g -o b.out b.c -L. -la
% lldb b.out
(lldb) disass -b -n main
b.out`main:
b.out[0x100003f90] <+0>:  55                    pushq  %rbp
b.out[0x100003f91] <+1>:  48 89 e5              movq   %rsp, %rbp
b.out[0x100003f94] <+4>:  48 8b 05 65 00 00 00  movq   0x65(%rip), %rax          ; (void *)0x0000000000000000

(lldb) br s -n main
(lldb) r
(lldb) disass -b -n main
b.out`main:
    0x100003f90 <+0>:  55                    pushq  %rbp
    0x100003f91 <+1>:  48 89 e5              movq   %rsp, %rbp
    0x100003f94 <+4>:  48 8b 05 65 00 00 00  movq   0x65(%rip), %rax          ; (void *)0x0000000100128000: globstr
    0x100003f9b <+11>: c7 45 fc 00 00 00 00  movl   $0x0, -0x4(%rbp)

(lldb) ima loo -va 0x100003f9b+0x65
      Address: b.out[0x0000000100004000] (b.out.__DATA_CONST.__got + 0)
      Summary: (void *)0x0000000100128000: globstr
       Module: file = "/tmp/b.out", arch = "x86_64"

(lldb) x/gx 0x100003f9b+0x65
0x100004000: 0x0000000100128000

(lldb) tar mod dump sect b.out

  0x00000003 data-ptrs        [0x0000000100004000-0x0000000100004008)  rw-  0x00004000 0x00000008 0x00000006 b.out.__DATA_CONST.__got

This __DATA,got section is marked as rw, so we read it from memory.

I'm not super convinced on the setting. It seems like something you'd add some people could work around a mistake we've made here.

I think we have two cases:

Someone calls Target::ReadMemory prefer_file_cache=true on a writable section. This is an easy mistake for people to make, they assume prefer_file_cache will only prefer the file cache on read-only Sections.

Someone reads Target::ReadMemory with prefer_file_cache=false on a read-only section. This is *often* a mistake, but there are times where we genuinely want to see raw memory.

Let's think about what the default behavior of Target::ReadMemory should be. The majority use case is that we prefer the file cache if it is a read-only section. It's a performance bug in almost every case to behave differently, I maintain. There are times when we want to show actual raw memory to people. Some naughty programs that manage to modify themselves, we need to see that real naughtiness, not show what the file had and pretend that is what is in memory.

I think Target::ReadMemory should have a force_live_memory argument, not a prefer_file_cache. It could be the final argument with a default value of force_live_memory==false. Almost everyone should let Target::ReadMemory check if this address is in a RO-section, and let it get the data from the file cache if so. If it's not in a RO section, or any section, read from live memory. And have the force_live_memory arg to override this.

This is a larger set of changes though, and I don't mean to sign anyone up for more than they intended. But, just thinking of this from a clean slate, I think that's the better design. Adrian, what do you think?

Hm, a little more thinking. Target::ReadMemory prefers a file cache if it is available. Process::ReadMemory only uses live memory. But Process::ReadMemory doesn't work when we have run lldb on a file and not run yet. e.g.

% lldb
(lldb) tar cr -d /tmp/b.out
Current executable set to '/tmp/b.out' (x86_64).
(lldb) disass -n main
b.out`main:
b.out[0x100003f90] <+0>:  pushq  %rbp
b.out[0x100003f91] <+1>:  movq   %rsp, %rbp
b.out[0x100003f94] <+4>:  movq   0x65(%rip), %rax          ; (void *)0x0000000000000000
b.out[0x100003f9b] <+11>: movl   $0x0, -0x4(%rbp)

(lldb) ima loo -va 0x100003f9b+0x65
      Address: b.out[0x0000000100004000] (b.out.__DATA_CONST.__got + 0)
      Summary: (void *)0x0000000000000000
       Module: file = "/tmp/b.out", arch = "x86_64"

(lldb) x/gx 0x100003f9b+0x65
0x100004000: 0x0000000000000000
(lldb)

I don't have a Process, so I must be using Target::ReadMemory here. and even though this section is read-write, we don't have any live memory so we're fetching it from the file. So that's an interesting extra wrinkle I didn't remember from Target::ReadMemory.

Trying to reword my thinking a little more clearly.

Process::ReadMemory - only works when you have a Process. Always reads directly from memory.

Target::ReadMemory - can be used with or without a Process.

  1. If force_live_memory == true, will read from memory if there is a Process. Else will read from file, if a Section covers this address.
  1. If the address is in a Section that is read-only, read from the file.
  1. If the address is in a section that is read-write, read from memory if there is a Process. Else read from the file, if a Section covers this address.

And I would make Target::ReadMemory's force_live_memory argument have a default value of false, because few people intend for this.

I'm open to hearing different opinions on this rethink! eg @clayborg @jingham

Let's think about what the default behavior of Target::ReadMemory should be. The majority use case is that we prefer the file cache if it is a read-only section. It's a performance bug in almost every case to behave differently, I maintain. There are times when we want to show actual raw memory to people. Some naughty programs that manage to modify themselves, we need to see that real naughtiness, not show what the file had and pretend that is what is in memory.

I think Target::ReadMemory should have a force_live_memory argument, not a prefer_file_cache. It could be the final argument with a default value of force_live_memory==false. Almost everyone should let Target::ReadMemory check if this address is in a RO-section, and let it get the data from the file cache if so. If it's not in a RO section, or any section, read from live memory. And have the force_live_memory arg to override this.

This is a larger set of changes though, and I don't mean to sign anyone up for more than they intended. But, just thinking of this from a clean slate, I think that's the better design. Adrian, what do you think?

Let's try to do the right thing here. I think your suggestions make a lot of sense.

Thanks for the input @jasonmolenda. FWIW I also think it's worth implementing this correctly.

Could you help me figure out which callers I should change to force_live_memory though?

Here's a list of everywhere ReadMemory is called:

SBTarget.cpp
    ReadInstructions
    ReadMemory

CommandObjectMemory.cpp
    DoExecute

Address.cpp
    ReadBytes

Disassembler.cpp
    ParseInstructions

Value.cpp
    GetValueAsData

ValueObject.cpp
    GetPointeeData

IRMemoryMap.cpp
    ReadMemory

DynamicLoaderDarwinKernel.cpp
    ReadKextSummaries
    ReadKextSummaryHeader

DynamicLoaderDarwin.cpp
    GetThreadLocalData

EmulateInstructionMIPS.cpp
    SetInstruction

UnwindAssemblyInstEmulation.cpp
    GetNonCallSiteUnwindPlanFromAssembly

UnwindAssembly-x86.cpp
    AugmentUnwindPlanFromCallSite
    FirstNonPrologueInsn
    GetFastUnwindPlan
    GetNonCallSiteUnwindPlanFromAssembly

Target.cpp  
    ReadCStringFromMemory
    ReadScalarIntegerFromMemory

Maybe force live memory only for the functions in DynamicLoaderDarwinKernel.cpp and DynamicLoaderDarwin.cpp?

I should say, my suggestion of changing Target::ReadMemory is just an opinion, please do feel free to disagree!

I think if you look at the callers, I suspect all (or nearly all) of them are passing prefer_file_cache == true right now, which is the force_live_memory==false default case. Anyone passing prefer_file_cache==false would want to pass force_live_memory==true, and it may be that they are doing this incorrect (you were originally looking at an instance where that was true in swift)

An example where we WANT prefer_file_cache==false aka force_live_memory==true is in CommandObjectMemory.cpp. When people use the memory read command, we want to show them the actual memory contents [*], we don't want to optimize by getting data from a file.

  • we still replace breakpoint instructions with the real instruction when reading instructions, iirc.

@jasonmolenda actually, most of them are passing prefer_file_cache = false, unfortunately. But they might be doing this because there's they weren't sure if the memory was writable, so this was the safer option. I want to change all the ones we deem safe to force_live_memory = false, which is why I ask which files you think this change would be safe.

Thanks for the input @jasonmolenda. FWIW I also think it's worth implementing this correctly.

Could you help me figure out which callers I should change to force_live_memory though?

Maybe force live memory only for the functions in DynamicLoaderDarwinKernel.cpp and DynamicLoaderDarwin.cpp?

Taking a quick look at the dynamic loader plugins, DynamicLoaderDarwin is calling ReadMemory in DynamicLoaderDarwin::GetThreadLocalData and I'm pretty sure the thread load storage address it is reading will be in a writable section, so our reimagined Target::ReadMemory will read from memory anyway.

DynamicLoaderDarwinKernel is calling Target::ReadMemory in ReadKextSummaryHeader and ReadKextSummaries -- it needs to read live memory, but

(lldb) ima loo -vs gLoadedKextSummaries
1 symbols match 'gLoadedKextSummaries' in /tmp/dl/mach.development.t8002:
        Address: mach.development.t8002[0x808c66b8] (mach.development.t8002.__DATA.__common + 394936)
(lldb) tar mod dump sect
  0x00000011 zero-fill        [0x0000000080866000-0x00000000808c8294)  rw-  0x00000000 0x00000000 0x00000001 mach.development.t8002.__DATA.__common

they're reading from a writable Section so we would do the right thing.

Changed the approach from a target setting to updating Target::ReadMemory to prefer the file cache (if section is read-only)

@jasonmolenda I've updated all the function calls to prefer the file cache (except the one you pointed out), but I'm a little worried about this, since I changed a lot of function calls. Maybe I could update the function signature to force_live_memory (and default it to false), but keep all the current calls having the same behavior (by negating whatever was passed previously as prefer_file_cache). What do you think?

@jasonmolenda I've updated all the function calls to prefer the file cache (except the one you pointed out), but I'm a little worried about this, since I changed a lot of function calls. Maybe I could update the function signature to force_live_memory (and default it to false), but keep all the current calls having the same behavior (by negating whatever was passed previously as prefer_file_cache). What do you think?

I think that's a safe way to approach it. This will make it easier for new callers to do the right thing when they look at the method arguments, and we can go through and audit these more carefully by hand in the future.

Updates all calls to ReadMemory to keep the current behavior.

jasonmolenda accepted this revision.Apr 16 2021, 10:50 AM

This looks good, thanks for doing this!

This revision is now accepted and ready to land.Apr 16 2021, 10:50 AM
This revision was landed with ongoing or failed builds.Apr 16 2021, 4:13 PM
This revision was automatically updated to reflect the committed changes.