Page MenuHomePhabricator

[elf-core] Improve reading memory from core file
ClosedPublic

Authored by djtodoro on Dec 30 2020, 1:36 AM.

Details

Summary

This patch tries to improve memory-read from core files (in order to improve disassembly functionality).
I am using RHEL 7.7 (linux kernel 3.10) and for a lot of cases, I was not able to disassemble some functions from backtrace when debugging crashes from core files. It outputs some dummy code as following:

(lldb) disassemble
example`fn:
    0x55deb51866b0 <+0>:   addb   %al, (%rax)
    0x55deb51866b2 <+2>:   addb   %al, (%rax)
    0x55deb51866b4 <+4>:   addb   %al, (%rax)
    0x55deb51866b6 <+6>:   addb   %al, (%rax)
    ....

The cause of the problem was the fact we are returning all the zeros from ProcessElfCore::ReadMemory() that is being called within Disassembler::ParseInstructions() and it disassembles some dummy opcodes from the buffer returned. If we are about to fill the buffer with all zeros, I guess it is safe to just return zero, and to proceed with reading from file itself. Therefore, we are removing zero bytes filling (padding) completely.

Before this patch (simple example that has been attached into the test):

$ lldb lldb/test/API/functionalities/postmortem/elf-core/linux-x86_64-for-disassemble.out -c lldb/test/API/functionalities/postmortem/elf-core/linux-x86_64-for-disassemble.core
(lldb) f 4
frame #4: 0x00007f1d2f862545 libc.so.6`__libc_start_main + 245
libc.so.6`__libc_start_main:
->  0x7f1d2f862545 <+245>: addb   %al, (%rax)
    0x7f1d2f862547 <+247>: addb   %al, (%rax)
    0x7f1d2f862549 <+249>: addb   %al, (%rax)
    0x7f1d2f86254b <+251>: addb   %al, (%rax)

After this patch applied:

(lldb) f 4
frame #4: 0x00007f1d2f862545 libc.so.6`__libc_start_main + 245
libc.so.6`__libc_start_main:
->  0x7f1d2f862545 <+245>: movl   %eax, %edi
    0x7f1d2f862547 <+247>: callq  0x39d10                   ; exit
    0x7f1d2f86254c <+252>: xorl   %edx, %edx
    0x7f1d2f86254e <+254>: jmp    0x22489                   ; <+57>

GDB was able to disassemble all the functions from core file.

Diff Detail

Event Timeline

djtodoro created this revision.Dec 30 2020, 1:36 AM
djtodoro requested review of this revision.Dec 30 2020, 1:36 AM
djtodoro edited the summary of this revision. (Show Details)Dec 30 2020, 1:38 AM

Have you by any chance learned why are we zero-filling this buffer in the first place? Seems like an odd thing to do... Maybe we should just stop zero-filling completely?

(@labath Sorry for late response, I've been away from keyboard for some time.)

Have you by any chance learned why are we zero-filling this buffer in the first place? Seems like an odd thing to do... Maybe we should just stop zero-filling completely?

Hmm... not sure why. It was there since the original commit that introduced the file (c037383aff81a61ed956858353ec003e970fb2ce). I don't see the point actually, and when I removed the zero_fill_size and zero filling with memset(), there was no regression in the lldb test suite. I guess we can remove it completely.

djtodoro updated this revision to Diff 317873.Jan 20 2021, 7:30 AM
djtodoro edited the summary of this revision. (Show Details)

The fact that the test suite passes is not too surprising since we have very few core file tests. However, I agree that we should just remove this zero filling completely.

Regarding the test, would it be possible to reuse one of the existing core files? (The reason we have so few core tests is because we used to not allow checked in core files at all -- now we kinda do, but it's still not ideal.) I'm guessing you don't even need to disassemble a function to reproduce this -- it should be sufficient to run disassemble --start-address X --end-address Y, where the region X--Y crosses a core segment boundary..

Regarding the test, would it be possible to reuse one of the existing core files? (The reason we have so few core tests is because we used to not allow checked in core files at all -- now we kinda do, but it's still not ideal.) I'm guessing you don't even need to disassemble a function to reproduce this -- it should be sufficient to run disassemble --start-address X --end-address Y, where the region X--Y crosses a core segment boundary..

We can disassemble a region that crosses core segment boundary, e.g.:
(let's use lldb/test/API/functionalities/postmortem/elf-core/linux-x86_64.core)

(lldb) disassemble --start 0x400161 --end 0x40100c
  0x400161: addb   %al, (%rax)
  0x400163: addb   %al, (%rax)
  0x400165: addb   %al, (%rax)
  0x400167: addb   %dl, (%rax,%rax)
  0x40016a: addb   %al, (%rax)
  0x40016c: addb   %al, (%rax)
  0x40016e: addb   %al, (%rax)
  0x400170: addl   %edi, 0x52(%rdx)
  0x400173: addb   %al, (%rcx)
  0x400175: js     0x400187
  0x400177: addl   %ebx, (%rbx)
  0x400179: orb    $0x7, %al
  0x40017b: orb    %dl, 0x1c000001(%rax)
  0x400181: addb   %al, (%rax)
  0x400183: addb   %bl, (%rax,%rax)
  0x400186: addb   %al, (%rax)
  0x400188: testb  %bh, %bh

and this triggers the code for zero-bytes-filling. After applying this patch, the output of the command is exactly the same.
I am not sure what should we check in the test if we use this existing core-file, any idea?

I think that's because lldb's dissassembler currently just stops when it encounters an unknown/invalid instruction :(, so it doesn't even get to the interesting part. If I skip over the random bytes I get:

(lldb) disassemble --start 0x400ff0 --end 0x40100c
    0x400ff0: addb   %al, (%rax)
    0x400ff2: addb   %al, (%rax)
    0x400ff4: addb   %al, (%rax)
    0x400ff6: addb   %al, (%rax)
    0x400ff8: addb   %al, (%rax)
    0x400ffa: addb   %al, (%rax)
    0x400ffc: addb   %al, (%rax)
    0x400ffe: addb   %al, (%rax)
    0x401000: addb   %al, (%rax)
    0x401002: addb   %al, (%rax)
    0x401004: addb   %al, (%rax)
    0x401006: addb   %al, (%rax)
    0x401008: addb   %al, (%rax)
    0x40100a: addb   %al, (%rax)

With your patch I guess this would stop at 0x400ffe.

Another option would be to ditch disassembling, and check this via memory reads, as that is what you are actually fixing:

(lldb) memory read 0x400ff0 -c 20
0x00400ff0: 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
0x00401000: 00 00 00 00                                      ....

Another option would be to ditch disassembling, and check this via memory reads, as that is what you are actually fixing

+1, nice! thanks!

djtodoro updated this revision to Diff 320483.Feb 1 2021, 8:35 AM
  • Test update
labath accepted this revision.Feb 4 2021, 1:56 AM

cool

This revision is now accepted and ready to land.Feb 4 2021, 1:56 AM

@labath Thanks for the review! I'll land this tomorrow.

This revision was automatically updated to reflect the committed changes.