This is an archive of the discontinued LLVM Phabricator instance.

Set orig_eax to -1 for Linux x86 platforms
ClosedPublic

Authored by ravitheja on Jul 22 2015, 5:28 AM.

Details

Summary

For Linux x86 based environments the orig_eax/orig_rax
register should be set to -1 to prevent the instruction pointer
to be decremented, which was the cause for the SIGILL exception.

Fix for Bug 23659

Diff Detail

Event Timeline

ravitheja updated this revision to Diff 30343.Jul 22 2015, 5:28 AM
ravitheja retitled this revision from to Set orig_eax to -1 for Linux x86 platforms.
ravitheja updated this object.

Ravi, thanks for working on this.

Greg, could you please take a look at this?

The general story is that on x86 linux we have a special pseudo-register, which controls what happens when we resume a process which was interrupted in a syscall. When evaluating an expression, we need to set it to -1 to avoid nasty surprises. Doing this inside a ThreadPlan doesn't sound like a good idea, but I am don't know enough about the codebase to say where it should go: ABI plugin? Platform?

source/Target/ThreadPlanCallFunction.cpp
189 ↗(On Diff #30343)

This loop can be replaced by reg_ctx->GetRegisterInfo(eRegisterKindLLDB, lldb_orig_rax_x86_64) (and similarly for i386), right?

clayborg resigned from this revision.Jul 29 2015, 10:44 AM
clayborg removed a reviewer: clayborg.

Please add Jim Ingham as a reviewer. He will have the best input on the subject.

source/Target/ThreadPlanCallFunction.cpp
188–200 ↗(On Diff #30343)

Don't you actually know it should be "orig_eax" by if it is i386 or "orig_rax" for x86_64?

Best to use:

const RegisterInfo *reg_info = reg_ctx->GetRegisterInfoByName ("orig_eax");
if (reg_info == nullptr)
    reg_info = reg_ctx->GetRegisterInfoByName ("orig_rax");

instead of your for loop.

But this isn't the way to do this. This seems like some sort of thing that should be Process::PrepareForExpression(), or Platform::PrepareForExpression(). We might then need a Process::CleanupFromExpression()/Platform::CleanupFromExpression() to cleanup?

The main questions is why is there an extra decrement of the PC? Is there a breakpoint being set on the return address and then that breakpoint is being removed before we detect it was there and we backup the PC incorrectly? Can you elaborate on 23659 a bit?

The main questions is why is there an extra decrement of the PC? Is there a breakpoint being set on the return address and then that breakpoint is being removed before we detect it was there and we backup the PC incorrectly? Can you elaborate on 23659 a bit?

This is pretty crazy stuff. The extra decrement happens inside the kernel when we resume the process. This is how syscall restart is implemented on linux. When we stop the inferior inside a syscall (let's say we interrupt a sleep() for example), the kernel presents us with the following view:

previous instructions
syscall
next instruction <--- PC

However, when the process resumes, the kernel will actually decrement the PC by 2 before letting it run. This will cause the process to re-execute the syscall instruction, which will give the kernel a chance to finish the syscall. All this is fine unless we actually move the PC, like we do when evaluating expressions, since then the extra decrement will likely land us in the middle of an instruction. This behavior is controlled by the orig_rax pseudo-register.

I don't think we need to do any special cleanup after this, since this "register" should get saved when we save all registers and then restored. Ravi, can you confirm this?

ravitheja updated this revision to Diff 31014.Jul 30 2015, 3:18 AM

Changes for accessing RegiserInfo
Replaced for loop with GetRegisterInfoByName to access
orig_eax/orig_rax register.

The main questions is why is there an extra decrement of the PC? Is there a breakpoint being set on the return address and then that breakpoint is being removed before we detect it was there and we backup the PC incorrectly? Can you elaborate on 23659 a bit?

This is pretty crazy stuff. The extra decrement happens inside the kernel when we resume the process. This is how syscall restart is implemented on linux. When we stop the inferior inside a syscall (let's say we interrupt a sleep() for example), the kernel presents us with the following view:

previous instructions
syscall
next instruction <--- PC

However, when the process resumes, the kernel will actually decrement the PC by 2 before letting it run. This will cause the process to re-execute the syscall instruction, which will give the kernel a chance to finish the syscall. All this is fine unless we actually move the PC, like we do when evaluating expressions, since then the extra decrement will likely land us in the middle of an instruction. This behavior is controlled by the orig_rax pseudo-register.

I don't think we need to do any special cleanup after this, since this "register" should get saved when we save all registers and then restored. Ravi, can you confirm this?

Yes, the explanation is correct, its more a Linux kernel implementation thing and no special cleanup is required. I did check before submitting for review that, after doing the expression evaluation the program resumes the interrupted syscall when the continue command is issued, so the registers are saved.

clayborg requested changes to this revision.Jul 30 2015, 10:26 AM
clayborg added a reviewer: clayborg.

So we have something similar on MacOSX where if a thread is in the kernel doing a syscall we have to ask the thread to abort safely.

So you can actually fix this in lldb-server by doing what you need to do in response to the:

< 37> send packet: $QSaveRegisterState;thread:8edbb7;#4a
< 5> read packet: $1#00

When you see this "QSaveRegisterState" packet, you can do what you need to do for a thread. When the expression is done, you will get the following packet:

< 42> send packet: $QRestoreRegisterState:1;thread:8edbb7;#0a
< 6> read packet: $OK#00

Then you can restore the registers however you need to. So no changes should be required on the LLDB side. Just in lldb-server.

This revision now requires changes to proceed.Jul 30 2015, 10:26 AM

@clayborg this patch was done in context of Bug 23659 which was reported on Linux, could you please check if the Bug is applicable to MacOS as well ?

jingham edited edge metadata.Jul 31 2015, 2:47 PM

On Mac OS X when we call a function on a thread we have to force the thread into a safe state before we can call functions on it. This has the effect of interrupting any system call pending on that thread, but we have to do it or we would end up corrupting register state (not the PC but also any of the other registers that were changed in the kernel but not reflected in the user thread state.)

So we don't see the bug you are seeing. We see the undesirable but unavoidable behavior that calling a function on a thread sitting in a syscall will cause that syscall to be interrupted.

Anyway, the point relevant to this bug is that lldb offloads all the "storing of state before running a function on a thread" and "restoring said state" to debugserver, which it implements in response to the QSaveRegisterState, QRestoreRegisterState packets. You could do the same thing, and hide the Linux specific gnarliness to your server. This packet is also more efficient since you just send a token back and forth rather than a complete copy of the register set in both directions.

labath added a comment.Aug 3 2015, 7:25 AM

The only issue I have with doing this on the server is that the name of the packet (QSaveRegisterState) gives no indication that it is going to change the state of the inferior in any way (ok, maybe except the capital Q). However, the description does state that it is used to prepare for expression evaluation and it matches the OSX behavior most closely, so I guess that could be the way to go.

ravitheja updated this revision to Diff 31237.Aug 3 2015, 8:37 AM
ravitheja edited edge metadata.

Moved implementation to lldb-server

ravitheja updated this revision to Diff 31239.Aug 3 2015, 8:47 AM
ravitheja edited edge metadata.

Correcting previous mistake
In last update, accidentally few files were excluded.
I am readding them

ravitheja updated this revision to Diff 31240.Aug 3 2015, 8:52 AM

Sorry again.

I am sorry for those accidents, I have verified the present diff and it is correct.

clayborg requested changes to this revision.Aug 10 2015, 9:47 AM
clayborg edited edge metadata.

It would be great to be able to make this change somewhere lower in the chain without having to define the extra orig_eax register. No one should have to see this register and the implementation should be in the x86_64 and i386 specific code. See inline comments and make new virtual functions in NativeRegisterContext to save and restore register state, move the existing code from GDBRemoteCommunicationServerLLGS::Handle_QSaveRegisterState() that gets all the registers into a buffer and move it to the NativeRegisterContext::SaveRegisterState(...). If we can, then much of this patch goes away.

source/Plugins/Process/Linux/NativeRegisterContextLinux_x86_64.cpp
52 ↗(On Diff #31240)

No one should ever need to see this register, hopefully we can not declare it and do the work in NativeRegisterContext::ReadAllRegisterValues()?

162 ↗(On Diff #31240)

No one should ever need to see this register, hopefully we can not declare it and do the work in NativeRegisterContext::ReadAllRegisterValues()?

source/Plugins/Process/Utility/RegisterContextLinux_i386.cpp
29 ↗(On Diff #31240)

Should be fine to have in this structure, but no one outside of this class should know about this register.

source/Plugins/Process/Utility/RegisterContextLinux_x86_64.cpp
36 ↗(On Diff #31240)

Should be fine to have in this structure, but no one outside of this class should know about this register.

source/Plugins/Process/Utility/RegisterInfos_i386.h
117–119 ↗(On Diff #31240)

No one outside of this class should know about this register, so remove this.

source/Plugins/Process/Utility/RegisterInfos_x86_64.h
128–130 ↗(On Diff #31240)

No one outside of this class should know about this register, so remove this.

336–338 ↗(On Diff #31240)

No one outside of this class should know about this register, so remove this.

source/Plugins/Process/Utility/lldb-x86-register-enums.h
39–41 ↗(On Diff #31240)

Hopefully we can hide this and not define this.

157–159 ↗(On Diff #31240)

Hopefully we can hide this and not define this.

source/Plugins/Process/gdb-remote/GDBRemoteCommunicationServerLLGS.cpp
2633–2657

Move this code into the x86_64 and i386 versions of NativeRegisterContext::ReadAllRegisterValues(...). which is called on line 2615 of this file.

This revision now requires changes to proceed.Aug 10 2015, 9:47 AM
ravitheja added a comment.EditedAug 11 2015, 5:09 AM

@clayborg
The register orig_eax/orig_rax could be important in some debugging scenarios and the user may want to see its value (one particular use case is the investigation of this bug), with the present changes the user can directly print the value of this register.
Secondly the register offsets calculation is done in RegisterInfos_x86_64.h and RegisterInfos_i386.h, in case if I drop those changes I need to then hardcode the register offset (or something close). The actual settings of the register could be moved to a place lower in the chain as u suggested. It would be great if you could clarify on these points i.e ->

  1. Whether we want the user to be able to print this register during debugging.
  2. Remove the offset calculation in RegisterInfos_x86_64.h and RegisterInfos_i386.h and do something like hardcoding the register offset for orig_rax/eax (plus the enums in lldb-x86-register-enums.h are needed otherwise there are assertions for the mismatch in the number of registers).
  3. The actual settings of the register can be moved to a place lower in the chain without disturbing the other changes.

Comments for your questions:

  • Whether we want the user to be able to print this register during debugging.

If this is purely an implementation detail, then we shouldn't show it to the user. The user shouldn't know or care about this register. Furthermore you don't want them to start relying on having it be around in case you take it away later after the kernel changes how it handles such things.

  • Remove the offset calculation in RegisterInfos_x86_64.h and RegisterInfos_i386.h and do something like hardcoding the register offset for orig_rax/eax (plus the enums in lldb-x86-register-enums.h are needed otherwise there are assertions for the mismatch in the number of registers).

This could always be stored at the end of all the registers at a dynamic offset. Or you could always place it at offset zero.

  • The actual settings of the register can be moved to a place lower in the chain without disturbing the other changes.

Lets try to hide it completely if we can. Who knows what will change in the future and if certain linux kernel versions will do different things, so the less the user knows about implementation details keeps our debugger doing what it should with no distractions.

You also commented on it being nice to see when debugging this issue: this can and should be handled by logging.

ravitheja updated this revision to Diff 31953.Aug 12 2015, 9:52 AM
ravitheja edited edge metadata.

Changes for previous feedback

clayborg accepted this revision.Aug 12 2015, 10:24 AM
clayborg edited edge metadata.

Looks good.

This revision is now accepted and ready to land.Aug 12 2015, 10:24 AM
ravitheja closed this revision.Aug 13 2015, 2:05 AM