This is an archive of the discontinued LLVM Phabricator instance.

[ARM64] Cleanup and speedup NativeRegisterContextLinux_arm64
ClosedPublic

Authored by omjavaid on Oct 24 2019, 2:25 AM.

Details

Summary

This patch simplifies register accesses in NativeRegisterContextLinux_arm64 and also adds some bare minimum caching to avoid multiple calls to ptrace during a stop.

Linux ptrace returns data in the form of structures containing GPR/FPR data. This means that one single call is enough to read all GPRs or FPRs. We do that once per stop and keep reading from or writing to the buffer that we have in NativeRegisterContextLinux_arm64 class. Before a resume or detach we write all buffers back.

This is tested on aarch64 thunder x1 with Ubuntu 18.04. Also tested regressions on x86_64.

Diff Detail

Event Timeline

omjavaid created this revision.Oct 24 2019, 2:25 AM

The idea seems nice, however I am wondering, if it is not going a bit too far. In not sure of the exact situation on arm64, but in general, there are registers which only accept some bit patterns. What exactly does ptrace do in this situation? Postponing the write until we resume/detach means we lose the opportunity to report any errors that would result from the ptrace call.

What's the exact case you're optimising for? Would it be enough to just cache the reads, but then as soon as we write something, we immediately call ptrace to ensure that the value was written successfully? It sounds like this could be enough as the majority of the operations should be reads...

source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.h
153–154 ↗(On Diff #226214)

Please move these next to the members they are guarding.

omjavaid marked an inline comment as done.EditedOct 25 2019, 4:15 AM

We ll be dealing with Linux user mode and mostly aarch64 data registers except for cpsr, fpsr and fpcr. I think we should be fine but let me confirm this again from documentation.

Further more, motivation behind these changes is SVE register set which may exchange high volume of data due to large register sizes.

source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.h
153–154 ↗(On Diff #226214)

Alright.

We ll be dealing with Linux user mode and mostly aarch64 data registers except for cpsr, fpsr and fpcr. I think we should be fine but let me confirm this again from documentation.

Right, but you're creating a general interface for all architectures, not just several aarch64 registers. Even if they don't make use of that facility now, it would be good to make sure they can do that in the future.

For instance, on x86, the kernel may decide to reject https://github.com/torvalds/linux/blob/master/arch/x86/kernel/ptrace.c#L187 some values of some registers, and silently ignore some bits in others https://github.com/torvalds/linux/blob/master/arch/x86/kernel/ptrace.c#L349. That's why I think it would be better to commit changes to memory automatically/immediately, and minimize the chances that subsequent "read" operations will return data which does not reflect the actual values held by the OS.

omjavaid updated this revision to Diff 231408.Nov 28 2019, 4:53 AM

Sorry for late response on this. I got side tracked into other issues and wanted performance testing before writing an update.

I agree with @labath about making caching for limited to reads only and register writes are written all the way in latest update.

Also according to performance tests these changes show no significant improvement on real hardware but if underlying hardware is a simulator like QEMU some improvement does occur. This is the main reason I am keeping these changes intact for register reads but have removed for register writes.

I have tested this patch on Ubuntu Bionic running on QEMU and Thunder X1 arm64 server.

LGTM?

Thanks. I now agree with this change in principle, but I don't think its ready for an LGTM yet.

The main thing I'd like to discuss is this InvalidateAllRegisters function. I think we should come up with a story of when is this function expected to be called, and document it somewhere. Right now, the placement of call sites seems pretty random. For instance, it's not clear to me why one would need to call it in NativeProcessLinux::SetupSoftwareSingleStepping.

The reason we want to call this function is because the values that the kernel holds for this thread have (potentially) changed, and so we want to ensure we don't hold stale values. The way I see it, the values can change when:
a) we do a "register write" operation. It looks like you have this part covered, and the invalidation happens pretty close to the actual "write" syscall
b) when the thread gets a chance to run. I think this what the other InvalidateAllRegisters are trying to cover, but it's hard to tell because they are very far from the place where the actual thread gets resumed. In particular, the call in NativeProcessLinux::Resume seems troublesome, as it will call NativeThreadLinux::Resume *after* it "invalidates" the caches. However, NativeProcessLinux::Resume is responsible for setting the watchpoints, and this involves register read/writes, which may end up re-populating some caches. I guess that doesn't matter on arm because of how watchpoints are set, but I think it would make it easier to reason about these things if the invalidation happened right before we resume the thread, or immediately after it stops.

lldb/include/lldb/Host/common/NativeRegisterContext.h
112–113 ↗(On Diff #231408)

Maybe this could be a method on NativeRegisterContextLinux. We can always later lift it to the main class if we find a reason to do that...

lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux_arm64.cpp
278–281

There's nothing happening after this if block, so you could write this so that it always returns (return WriteGPR()), and then drop the "else" from the next condition.

307–308

I guess this isn't needed, since Write[FG]PR already handle invalidation on their own?

878

Please change this to an early return (as well as the ReadGPR above).

omjavaid updated this revision to Diff 232501.Dec 5 2019, 11:39 PM

Hi

I have updated this patch after incorporating previous suggestions like moving InvalidateAllRegisters declaration to NativeRegisterContextLinux and also relocated calls to InvalidateAllRegisters from NativeProcessLinux to NativeThreadLinux. We now call InvalidateAllRegisters just before we resume or single step via ptrace. Other than that on any register write register will be be invalidated by WriteGPR and WriteFPR as was happening previously. We dont really need a invalidate on detach because thread wont exist after that so I have removed it InvalidateAllRegisters which was being called at detach.

Also testing for GPR or FPR validity now happens outside WriteGPR and WriteFPR.

Thoughts?

labath accepted this revision.Dec 6 2019, 12:54 AM

I think this looks good now. Thanks for your patience.

This revision is now accepted and ready to land.Dec 6 2019, 12:54 AM
labath added inline comments.Dec 6 2019, 1:03 AM
lldb/source/Plugins/Process/Linux/NativeRegisterContextLinux.h
32

no semicolon here

This revision was automatically updated to reflect the committed changes.
Herald added a project: Restricted Project. · View Herald TranscriptDec 6 2019, 9:25 AM