Add:
- RVM and RVA instructions sets.
- corresponding unittests.
Further work:
- implement RVC, RVF, RVD, and RVV extension.
Differential D133670
[LLDB][RISCV] Add RVM and RVA instruction support for EmulateInstructionRISCV Emmmer on Sep 11 2022, 8:28 AM. Authored by
Details Add:
Further work:
Diff Detail
Event TimelineComment Actions General comment on all this, and forgive me if this sounds outlandish but are we going to end up with a complete riscv emulator here? Given that until hardware single step arrives we're going to have to emulate any possible instruction that we might breakpoint. Or am I mistaken here? If it is the case that we're going to end up with almost every instruction here, I don't necessarily object on a technical level. Without spec updates there is no alternative. However, I do think that it would be worth getting someone who works on riscv daily to review for instruction correctness. I've written emulators before and getting one instruction slightly off can be a nightmare to find after the fact. Ideally you could run a whole pile of programs single stepped and verify their results but obviously that is easier said than done. Something like csmith can give you programs, but not ones with specific results. Might find some crashes though if you got some memory operands wrong somewhere. Anyway, just some ideas because I am concerned about how sustainable this effort is (and how much time it'll cost you to get right).
Comment Actions Do riscv extensions come with any kind of assembly self test suites? If such a thing existed you could single step those as a way to test this sort of thing. Comment Actions fwiw we had to do this for the earliest iphones, where there were no hardware breakpoints and no instruction single step. We had a tiny engine that could calculate the address of the next instruction on those armv6 chips. It was pretty limited in what it needed to be able to do, but I guess extending the instruction emulation engine to support this use case is reasonable. It really only needs to emulate instructions well enough to track where registers are spilled to stack, saved in other registers, etc, for unwinding. After one or two revisions, we got hardware breakpoints and were able to set a "stop when pc != current address" which was effectively single step. Then in ARMv8 AArch32/AArch64 gave us genuine instruction step and that's what we use now. Comment Actions That was poorly worded. The emulation engine right now only needs to track side effects relevant to unwind. It doesn't track control flow side effects of instructions, short of branches within a function where we "forward" the unwind state to the target instruction. The instruction emulation engine (today) runs over a function statically, and generates an intermediate representation of the unwind state at every instruction point. What you'd need for instruction single step is a similar intermediate representation that expressed branch behavior for every branchable instruction. Does this instruction depend on a condition flag? Does it branch to an address in a register? That kind of thing. Or you can do the approach debugserver took so many years ago, where we are stopped at a specific instruction, and we decode it well enough to know where it will execute next, no intermediate representation of those details. All of this assuming David is correct, that this is the problem being looked at. Comment Actions
And thinking about it further, the whole point of emulating all of the instructions in a function is that we need the full sequence to generate accurate unwind records. If we're only interested in what the next instruction address will be, we need the current register context and know how to decode one instruction. It's a bit of a different problem. Comment Actions Yes, riscv does not currently implement hardware breakpoints or instruction single step. Almost every instruction in riscv is able to read and write to registers (rs1, rs2, and rd), so for those developers who wish to control every register, it is necessary to simulate every instruction completely. For example, after an instruction that changes a register, the program may use the JALR to jump to the address stored in that register. For now we have few method to test a program, the best way is using unittest to cover this emulator. Yes, we can try to use this test suite https://github.com/riscv-software-src/riscv-tests Comment Actions
So can most instructions write to the PC as one of the registers? I see that if you were on the following instruction, you'd have to emulate it to know the place it jumps to: add pc, #4 On AArch64 the PC is often not allowed so if we were to emulate there, it cuts down the work needed a lot. With the jalr isn't it more like: add r1, #4 jalr r1 So we could (or do) put a breakpoint on the jalr because we know that the add doesn't do a branch itself. Once we hit the jalr we can read registers to get r1 and know where to break next. Meaning for cases like that you only need a "can this branch" rather than working out the whole thing. Of course if almost any instruction can write to the pc, "can branch" is almost always true.
Cool! Well, I'm not requiring you to do that but keep it in mind if you end up spending a lot of time bug fixing this code. On the review side, @asb can you reccomend anyone who would be able to review the riscv specifics? (just because I remember you have done a lot of work on riscv) (again not blocking this change on that, but I think it would be more sustainable going forward if there was a riscv specific reviewer too) Comment Actions PC isn't exposed as a GPR on RISC-V, only explicit branch/jump instructions can alter control flow Comment Actions
Thanks. In which case does that mean we only need to emulate instructions that would alter control flow? Perhaps that is technically true but not a good experience in practice. I will put some time into reading exactly how lldb implements software single step but in the meantime maybe @jasonmolenda will know from experience.
Comment Actions Mostly grumbling about optional from me :) Once that is addressed I think that'll be it. For future patches please keep the number of added instructions small (many patches is fine, just keep each review manageable) and again, if you can find a riscv expert to review the specifics of the instructions that would be a massive help.
Comment Actions Sorry for the inconvenience, I will pay attention to the size of the patch next time.
Comment Actions One idea I had to help review future patches, if the authors of the llvm support for the extension are still active you could add them as reviewers. I'm happy to continue reviewing of course, just aware that I'm not doing the most in depth job here. This change LGTM.
Comment Actions Hey @Emmmer the unit test is throwing a UBSAN failure, could you please take a look? Stacktrace Script(shard): -- GTEST_OUTPUT=json:/Users/buildslave/jenkins/workspace/lldb-cmake-sanitized/lldb-build/tools/lldb/unittests/Instruction/./EmulatorTests-lldb-unit-48390-3-26.json GTEST_SHUFFLE=1 GTEST_TOTAL_SHARDS=26 GTEST_SHARD_INDEX=3 /Users/buildslave/jenkins/workspace/lldb-cmake-sanitized/lldb-build/tools/lldb/unittests/Instruction/./EmulatorTests -- Note: This is test shard 4 of 26. Note: Randomizing tests' orders with a seed of 42603 . [==========] Running 1 test from 1 test suite. [----------] Global test environment set-up. [----------] 1 test from RISCVEmulatorTester [ RUN ] RISCVEmulatorTester.TestAtomicSequence /Users/buildslave/jenkins/workspace/lldb-cmake-sanitized/llvm-project/lldb/unittests/Instruction/RISCV/TestRISCVEmulator.cpp:254:3: runtime error: store to misaligned address 0x61b000001034 for type 'uint64_t' (aka 'unsigned long long'), which requires 8 byte alignment 0x61b000001034: note: pointer points here af 27 04 10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 ^ SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior /Users/buildslave/jenkins/workspace/lldb-cmake-sanitized/llvm-project/lldb/unittests/Instruction/RISCV/TestRISCVEmulator.cpp:254:3 in -- exit: -6 -- shard JSON output does not exist: /Users/buildslave/jenkins/workspace/lldb-cmake-sanitized/lldb-build/tools/lldb/unittests/Instruction/./EmulatorTests-lldb-unit-48390-3-26.json |
Try using llvm::Optional<T> for this, it will save having to remember to make success false before calling the function.
And if you're not going to do that, at least make success a bool& (even if you need to take its address to pass down to ReadMemoryUnsigned).