For Linux x86 based environments the orig_eax/orig_rax
register should be set to -1 to prevent the instruction pointer
to be decremented, which was the cause for the SIGILL exception.
Fix for Bug 23659
Differential D11411
Set orig_eax to -1 for Linux x86 platforms ravitheja on Jul 22 2015, 5:28 AM. Authored by
Details For Linux x86 based environments the orig_eax/orig_rax Fix for Bug 23659
Diff Detail Event TimelineComment Actions Ravi, thanks for working on this. Greg, could you please take a look at this? The general story is that on x86 linux we have a special pseudo-register, which controls what happens when we resume a process which was interrupted in a syscall. When evaluating an expression, we need to set it to -1 to avoid nasty surprises. Doing this inside a ThreadPlan doesn't sound like a good idea, but I am don't know enough about the codebase to say where it should go: ABI plugin? Platform?
Comment Actions Please add Jim Ingham as a reviewer. He will have the best input on the subject.
Comment Actions The main questions is why is there an extra decrement of the PC? Is there a breakpoint being set on the return address and then that breakpoint is being removed before we detect it was there and we backup the PC incorrectly? Can you elaborate on 23659 a bit? Comment Actions This is pretty crazy stuff. The extra decrement happens inside the kernel when we resume the process. This is how syscall restart is implemented on linux. When we stop the inferior inside a syscall (let's say we interrupt a sleep() for example), the kernel presents us with the following view: previous instructions syscall next instruction <--- PC However, when the process resumes, the kernel will actually decrement the PC by 2 before letting it run. This will cause the process to re-execute the syscall instruction, which will give the kernel a chance to finish the syscall. All this is fine unless we actually move the PC, like we do when evaluating expressions, since then the extra decrement will likely land us in the middle of an instruction. This behavior is controlled by the orig_rax pseudo-register. I don't think we need to do any special cleanup after this, since this "register" should get saved when we save all registers and then restored. Ravi, can you confirm this? Comment Actions Changes for accessing RegiserInfo Comment Actions Yes, the explanation is correct, its more a Linux kernel implementation thing and no special cleanup is required. I did check before submitting for review that, after doing the expression evaluation the program resumes the interrupted syscall when the continue command is issued, so the registers are saved. Comment Actions So we have something similar on MacOSX where if a thread is in the kernel doing a syscall we have to ask the thread to abort safely. So you can actually fix this in lldb-server by doing what you need to do in response to the: < 37> send packet: $QSaveRegisterState;thread:8edbb7;#4a When you see this "QSaveRegisterState" packet, you can do what you need to do for a thread. When the expression is done, you will get the following packet: < 42> send packet: $QRestoreRegisterState:1;thread:8edbb7;#0a Then you can restore the registers however you need to. So no changes should be required on the LLDB side. Just in lldb-server. Comment Actions @clayborg this patch was done in context of Bug 23659 which was reported on Linux, could you please check if the Bug is applicable to MacOS as well ? Comment Actions On Mac OS X when we call a function on a thread we have to force the thread into a safe state before we can call functions on it. This has the effect of interrupting any system call pending on that thread, but we have to do it or we would end up corrupting register state (not the PC but also any of the other registers that were changed in the kernel but not reflected in the user thread state.) So we don't see the bug you are seeing. We see the undesirable but unavoidable behavior that calling a function on a thread sitting in a syscall will cause that syscall to be interrupted. Anyway, the point relevant to this bug is that lldb offloads all the "storing of state before running a function on a thread" and "restoring said state" to debugserver, which it implements in response to the QSaveRegisterState, QRestoreRegisterState packets. You could do the same thing, and hide the Linux specific gnarliness to your server. This packet is also more efficient since you just send a token back and forth rather than a complete copy of the register set in both directions. Comment Actions The only issue I have with doing this on the server is that the name of the packet (QSaveRegisterState) gives no indication that it is going to change the state of the inferior in any way (ok, maybe except the capital Q). However, the description does state that it is used to prepare for expression evaluation and it matches the OSX behavior most closely, so I guess that could be the way to go. Comment Actions Correcting previous mistake Comment Actions It would be great to be able to make this change somewhere lower in the chain without having to define the extra orig_eax register. No one should have to see this register and the implementation should be in the x86_64 and i386 specific code. See inline comments and make new virtual functions in NativeRegisterContext to save and restore register state, move the existing code from GDBRemoteCommunicationServerLLGS::Handle_QSaveRegisterState() that gets all the registers into a buffer and move it to the NativeRegisterContext::SaveRegisterState(...). If we can, then much of this patch goes away.
Comment Actions @clayborg
Comment Actions Comments for your questions:
If this is purely an implementation detail, then we shouldn't show it to the user. The user shouldn't know or care about this register. Furthermore you don't want them to start relying on having it be around in case you take it away later after the kernel changes how it handles such things.
This could always be stored at the end of all the registers at a dynamic offset. Or you could always place it at offset zero.
Lets try to hide it completely if we can. Who knows what will change in the future and if certain linux kernel versions will do different things, so the less the user knows about implementation details keeps our debugger doing what it should with no distractions. You also commented on it being nice to see when debugging this issue: this can and should be handled by logging. |
This loop can be replaced by reg_ctx->GetRegisterInfo(eRegisterKindLLDB, lldb_orig_rax_x86_64) (and similarly for i386), right?