This is an archive of the discontinued LLVM Phabricator instance.

[LLDB][RISCV] Make software single stepping work
ClosedPublic

Authored by Emmmer on Aug 12 2022, 3:44 AM.

Details

Summary

Add:

  • EmulateInstructionRISCV, which can be used for riscv32 and riscv64.
  • Add unittests for EmulateInstructionRISCV.

Note: Compressed instructions set (RVC) was still not supported in this patch.

Diff Detail

Event Timeline

Emmmer created this revision.Aug 12 2022, 3:44 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 12 2022, 3:44 AM
Emmmer requested review of this revision.Aug 12 2022, 3:44 AM

Initial reactions, since I won't get time to review this today.

EmulateInstructionRISCV, which can be used for riscv32 and riscv64.

I know that ARM has tests for EmulateInstructionARM, and I would like to know this is tested. Either the way that does it or with unittests.

handle resume thread error.
Fix possible nullptr dereference.

Are these two generic changes (not specific to riscv)? If so can you make a patch for each. If it can be tested as well then great, if it's some theoretical corner case then should be fine without.

I presume that with this change a bunch of tests started passing? Would be good to summarise that in the commit message. "after this change a further N tests that use single stepping now pass".

Emmmer updated this revision to Diff 452909.Aug 16 2022, 1:27 AM
Emmmer edited the summary of this revision. (Show Details)

Address review comments:

  • Add unittests for EmulateInstructionRISCV.
  • Split "thread error" and "nullptr dereference" as separate PRs.
DavidSpickett added inline comments.Aug 16 2022, 2:32 AM
lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
36

Move this next to first use if possible.

38

What part of the encoding does this help you extract? Can you add a comment with the name of the field as the spec calls it.

39

Also what does this mean?

83

Here we know that reg_encode is not 0, and reg_encode is unsigned so it cannot be negative. So it could just be if (reg_encode <= 31).

109

Does compressed come into this at any point? Just checking since I see a lot of +4. It's fine to say it's not supported at this time.

120

JALR clears the bottom bit, why is this?

On Arm thumb we have bx which can change modes between arm and thumb code. In thumb code the bottom bit is used as a mode bit to say you're in thumb. Is it anything like that?

Please add comments to explain.

126

Is funct3 the name of a field from the spec? If so can you add a comment like "funct3 is the type of the compare", whatever it means.

139

Do you ever expect to hit this? Add an assert if you don't.

146

Nit: put this next to the ReadPC call.

Generally, order the declarations as they are used.

150

Do this check one line up immediately after the ReadPC.

164

I would just merge these into the CompareB below.

182

Can you explain in the comments what RVI and RVA are?

183

What is not certain at this point? It would be good to record that at any point where the spec being in flux is an issue.

E.g. "this code may change once the spec is ratified, at the moment X and Y aspects are not fixed"

Just in case we wanted to fixup individual bits as they get decided on later (or find out where we diverge from the final spec).

188

Just put the array_lengthof call in the for line.

190

I would write this inst & pat.type_mask. I know it works either way but it looks a bit strange in my opinion.

198

0x%x and does not branch:

223

This comment seems redundant given that the function is called DecodeAndExecute.

250

Please add comments to explain this.

I think what you're doing is checking whether the compressed encoding could be valid. We do similair things for Thumb encodings for Arm.

251

Seems easier to write this as != 3.

260

I would change this into an early return style like this:

  bool success = false;
  m_addr = ReadPC(&success);
  if (!success)
    m_addr = LLDB_INVALID_ADDRESS;

    Context ctx;
    ctx.type = eContextReadOpcode;
    ctx.SetNoArgs();
    uint32_t inst = (uint32_t)ReadMemoryUnsigned(ctx, m_addr, 4, 0, &success);
    uint16_t try_rvc = (uint16_t)(inst & 0x0000ffff);
    uint16_t mask = try_rvc & 0b11;
    if (try_rvc != 0 && (mask == 0 || mask == 1 || mask == 2)) {
      m_opcode.SetOpcode16(try_rvc, GetByteOrder());
    } else {
      m_opcode.SetOpcode32(inst, GetByteOrder());
    }

  return true;
}
lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.h
56

Do we need this wrapper/forwarder? It just passes on the arg unchanged.

lldb/unittests/Instruction/CMakeLists.txt
4 ↗(On Diff #452909)

Also for AArch64 this should be "AARCH64" or "ARM64" (I think it's the former).

In this context ARM == 32 bit Arm and AArch64 == 64 bit Arm (where elsewhere Arm means both, and I agree, it's confusing).

16 ↗(On Diff #452909)

Why do we need this conditional logic here?

Surely unittests don't need a riscv or arm64 host and shouldn't clash with each other, or did you find some issue including both?

I know this is a flood of comments but the overall impression is that this is mostly fine. A bunch of small things.

Thanks for splitting out the other patches and adding the test cases here.

lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
299

Assert here if you do not expect to hit this code. (even if that's just temporary until you emulate more instructions, it's better to know that something needs implementing)

lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.h
21

You have 2 public markers in the same class with no private between them. Intentional?

lldb/source/Plugins/Process/Linux/NativeProcessLinux.cpp
886

Could you use isRISCV here instead?

lldb/unittests/Instruction/RISCV/TestRISCVEmulator.cpp
9 ↗(On Diff #452909)

This appears unused.

39 ↗(On Diff #452909)

Please explain what this does. Just on the face of it, something a bit dodgy :) (maybe there is a better way to express it, even if it is legit)

45 ↗(On Diff #452909)
if (reg == gpr_x0_riscv)
      reg_value.SetUInt(0, reg_info->byte_size);
else
    reg_value.SetUInt(tester->gpr.gpr[reg], reg_info->byte_size);

return true;
55–59 ↗(On Diff #452909)
if (reg != gpr_x0_riscv)
  tester->gpr.gpr[reg] = reg_value.GetAsUInt64();

return true;
68 ↗(On Diff #452909)

Assert that this was successful.

79 ↗(On Diff #452909)

EncodeInstruction? I don't think we need to shorten the names quite this much.

103 ↗(On Diff #452909)

You check the ~1 here but your input values don't have the bottom bit set.

Does that mean this test is not covering that aspect of the emulation? Seems like it should be.

106 ↗(On Diff #452909)

EncodeBranch

140 ↗(On Diff #452909)

testBranch (you get the idea by now)

164 ↗(On Diff #452909)

Add comments to explain what these numbers are. With the macro wrapper it's hard to tell what they mean and what this is actually checking.

Oh and please note in the commit message that this does not support compressed instructions (or does it, I assume no but make it obvious).

Emmmer marked 30 inline comments as done.Aug 16 2022, 5:00 AM
Emmmer added inline comments.
lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
83

Just to make the if condition explicitly exhaustive. Not relying on implicitly derived conditions helps the code survive larger refactorings that may break the implicit conditions.

Sure it can be simplified to reg_encode <= 31, but we should let compilers do it for us rather than doing it manually.

109

No. jal/jalr/b are always 4-byte instructions. Supporting RVC does not need to
change the +4 here.

120

It is like the mode bit in ARM but more flexible since riscv does not have two modes.

The spec says auxiliary information can be stored to LSB in function addresses (like JIT compilers can use this bit). It is open to developers.

126

Yes. funct3 is a widely used name in spec meaning 3-bits function selector

183

It's a pity that the whole debug spec was not useable (not implemented by any emulator nor hardware).

We don't use this code for platforms which implemented the debug spec. But for platforms without the debug spec we still need this emulation. We don't need to change the code here in either case in the future. So I guess it's fine to not record the issue.

223

A sense of ceremony haha. Will delete the comment accordingly.

250

Exactly! Comments are added now.

lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.h
21

Yes, to distinguish public static members from public instance members.

56

Yes. SupportsThisInstructionType is also used in CreateInstance which is a static method.

lldb/unittests/Instruction/CMakeLists.txt
4 ↗(On Diff #452909)

This code confused me a lot and thank you for the explanation.
I am going to check ARM and AARCH64 explicitly.

16 ↗(On Diff #452909)

I don't know either. I see the original code checking TARGETS_TO_BUILD and there might be special consideration I suppose?

lldb/unittests/Instruction/RISCV/TestRISCVEmulator.cpp
9 ↗(On Diff #452909)

nice catch!

39 ↗(On Diff #452909)

It is actually an always-success dynamic_cast because:

  • The RISCVEmulatorTester is derived from EmulateInstruction
  • These callbacks are set in the constructor of RISCVEmulatorTester
  • All EmulateInstruction instances created in this unittest are always RISCVEmulatorTester

Signature of these callbacks is not polymorphic (like for Java we have <T extends EmulateInstruction> with bounded constraints.) and that's the reason why we need this cast. We may turn it into a template but it requires more effort.

I also noticed that there's a void *baton in parameters. But using that field looks more terrible. I guess we could have a userdata whose type was parametrized.

79 ↗(On Diff #452909)

The I stands for I-Type, a riscv instruction format. Like EncodeB below.

103 ↗(On Diff #452909)

0x1024 - 255 = 3877
(0x1024 - 255) & ~1 = 3876

I think it is covered 😃

106 ↗(On Diff #452909)

The B stands for B-Type which works for all B-type instructions.

Branch instructions are all of B-type so renaming it to EncodeBranch seems Okay right now, but we may add more B-type instructions in the future.

140 ↗(On Diff #452909)

Yes this should be Branch since it only tests branch instructions.

Emmmer updated this revision to Diff 452959.Aug 16 2022, 5:02 AM
Emmmer marked 11 inline comments as done.
Emmmer edited the summary of this revision. (Show Details)

address review comments

Emmmer marked an inline comment as done.Aug 16 2022, 5:03 AM

Are we expecting this code to only ever be given JAL or JALR at this time?

If it's the case that anything else will just return false or do nothing, might be worth adding a couple of tests that check that encodings that aren't branches just do nothing or fail. Obviously not every encoding maybe just flip 1 bit in the instruction type field just as a smoke test for each of jal and jalr.

lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
223

Now worries. Usually I end up with these because I wrote out the steps as comments first and forget to delete them.

lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.h
21

Got it, I missed that.

lldb/unittests/Instruction/CMakeLists.txt
16 ↗(On Diff #452909)

Sorry I completely missed that first line (the diff split at that point). Right, so you're just adding the riscv equivalent of what's already there. So actually just leave the ARM bit as is, I'll do some builds and find out whether it needs to change for AArch64.

lldb/unittests/Instruction/RISCV/TestRISCVEmulator.cpp
98 ↗(On Diff #452959)

Add a comment like "JALR will always zero the bottom bit of the target".

161 ↗(On Diff #452959)

Now I understand what's going on. I thought that this somehow included non branch encodings when in fact what it's doing is saying for a branch equals we should only branch if the operands are equal.

39 ↗(On Diff #452909)

But using that field looks more terrible. I guess we could have a userdata whose type was parametrized.

Yeah there's a lot of...not ideal patterns in this area but I wouldn't pay much attention to them.

Without opting into llvm's rtti equivalent I guess this is the best way to do it.

79 ↗(On Diff #452909)

Cool. It's mostly that EncodeB looks like the author got distracted half way through the name. But anyway I'm not familiar with the spec so if it matches that then great.

103 ↗(On Diff #452909)

Doh, yes I wasn't looking at the offset.

Emmmer added inline comments.Aug 16 2022, 6:35 AM
lldb/unittests/Instruction/CMakeLists.txt
16 ↗(On Diff #452909)

Oh... thank you! I am gonna revert the change for AArch64.

lldb/unittests/Instruction/RISCV/TestRISCVEmulator.cpp
79 ↗(On Diff #452909)

You're right, the name looks unfriendly to other developers. So I changed its name to EncodeIType (also for EncodeBType) to avoid confusion.

Emmmer updated this revision to Diff 452997.Aug 16 2022, 7:13 AM

address review comments

This revision is now accepted and ready to land.Aug 16 2022, 7:22 AM
This revision was automatically updated to reflect the committed changes.