This is an archive of the discontinued LLVM Phabricator instance.

[LLDB][RISCV] Add more instruction decode and execute for EmulateInstructionRISCV
ClosedPublic

Authored by Emmmer on Aug 27 2022, 8:02 AM.

Details

Summary

Add:

  • most of instructions from RVI base instructions set.
  • some instruction decode tests from objdump.

Further work:

  • implement riscv imac extension.

Diff Detail

Event Timeline

Emmmer created this revision.Aug 27 2022, 8:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptAug 27 2022, 8:02 AM
Emmmer requested review of this revision.Aug 27 2022, 8:02 AM
Emmmer updated this revision to Diff 456175.Aug 28 2022, 12:04 AM
tzb99 added a comment.Aug 29 2022, 5:24 PM

Upon this change, My lldb logging changes a bit. The instruction is not shown anymore, but stepping is still working. Is it as expected?

* thread #1, name = 'test.o', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010528
lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
35–39

Upon this change, My lldb logging changes a bit. The instruction is not shown anymore, but stepping is still working. Is it as expected?

* thread #1, name = 'test.o', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010528

Upon this change, My lldb logging changes a bit. The instruction is not shown anymore, but stepping is still working. Is it as expected?

* thread #1, name = 'test.o', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010528

I tried to reproduce but failed.
Could you please provide more information?
(for example: what is written in main, why use .o for debugging, and the command operations executed in lldb)

tzb99 added a comment.Aug 30 2022, 4:06 PM

Upon this change, My lldb logging changes a bit. The instruction is not shown anymore, but stepping is still working. Is it as expected?

* thread #1, name = 'test.o', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010528

I tried to reproduce but failed.
Could you please provide more information?
(for example: what is written in main, why use .o for debugging, and the command operations executed in lldb)

Sure. My testing script is:

#include<stdio.h>
void fun2() {
	printf("Enter fun2\n");
}
int fun1() {
	printf("Enter fun1\n");
	fun2();
	return 10;}

int main(int argc, char **argv) {
	int a = fun1();
	int b = 15;
       	printf("hello world\n");
	return 0;
}

So which file format should be used for debugging? My command option on the client side is:

lldb-server g 0.0.0.0:2345 test.o

On the lldb (x86) side is:

lldb
gdb-remote 2345
s

The objdump of my test.o file is:

0000000000010528 <_start>:
   10528:       02e000ef                jal     ra,10556 <load_gp>
   1052c:       87aa                    mv      a5,a0
   1052e:       00000517                auipc   a0,0x0
   10532:       14650513                addi    a0,a0,326 # 10674 <main>
   10536:       6582                    ld      a1,0(sp)
   10538:       0030                    addi    a2,sp,8
   1053a:       ff017113                andi    sp,sp,-16
   1053e:       00000697                auipc   a3,0x0
   10542:       68068693                addi    a3,a3,1664 # 10bbe <__libc_csu_init>
   10546:       00000717                auipc   a4,0x0

My compilation flavor should be:

riscv64-unknown-linux-gnu-gcc -g -o test.o hello.c \
-static -static-libgcc -static-libstdc++ -march=rv64imafd

Please let me know if you need more information!

Emmmer added a comment.EditedAug 31 2022, 2:51 AM

Upon this change, My lldb logging changes a bit. The instruction is not shown anymore, but stepping is still working. Is it as expected?

* thread #1, name = 'test.o', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010528

I tried to reproduce but failed.
Could you please provide more information?
(for example: what is written in main, why use .o for debugging, and the command operations executed in lldb)

Sure. My testing script is:

#include<stdio.h>
void fun2() {
	printf("Enter fun2\n");
}
int fun1() {
	printf("Enter fun1\n");
	fun2();
	return 10;}

int main(int argc, char **argv) {
	int a = fun1();
	int b = 15;
       	printf("hello world\n");
	return 0;
}

So which file format should be used for debugging? My command option on the client side is:

lldb-server g 0.0.0.0:2345 test.o

On the lldb (x86) side is:

lldb
gdb-remote 2345
s

The objdump of my test.o file is:

0000000000010528 <_start>:
   10528:       02e000ef                jal     ra,10556 <load_gp>
   1052c:       87aa                    mv      a5,a0
   1052e:       00000517                auipc   a0,0x0
   10532:       14650513                addi    a0,a0,326 # 10674 <main>
   10536:       6582                    ld      a1,0(sp)
   10538:       0030                    addi    a2,sp,8
   1053a:       ff017113                andi    sp,sp,-16
   1053e:       00000697                auipc   a3,0x0
   10542:       68068693                addi    a3,a3,1664 # 10bbe <__libc_csu_init>
   10546:       00000717                auipc   a4,0x0

My compilation flavor should be:

riscv64-unknown-linux-gnu-gcc -g -o test.o hello.c \
-static -static-libgcc -static-libstdc++ -march=rv64imafd

Please let me know if you need more information!

I coiped your test code to tzb.c

(on x86)

$ riscv64-unknown-linux-gnu-gcc -g -o tzb tzb.c \
  -static -static-libgcc -static-libstdc++ -march=rv64imafd

PS: -o usage is: -o <file>, it means place the output into <file>.


(on qemu-system-riscv64)

$ lldb
(lldb) gdb-remote 8888

(lldb) Process 5933 stopped
* thread #1, name = 'tzb', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010528 tzb`_start at start.S:50
warning: This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited.

Then I tried to read DWARF in tzb, and dwarfdump report DW_AT_language is DW_LANG_Mips_Assembler, but DW_AT_language should be DW_LANG_C99.


It may be a bug caused by riscv-gnu-toolchain, and we can use this to avoid it.

(on qemu-system-riscv64)

$ gcc -g -o tzb tzb.c \
  -static -static-libgcc -static-libstdc++ -march=rv64imafd

$ lldb-server g 0.0.0.0:2345 tzb
Launched 'tzb' as process 6839...
lldb-server-local_build

(Switch to another tab)

$ lldb
(lldb) gdb-remote 2345

(lldb) Process 6839 stopped
* thread #1, name = 'tzb', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010584 tzb`_start
tzb`_start:
->  0x10584 <+0>:  jal    48                    <--|
    0x10588 <+4>:  mv     a5, a0                <--|
    0x1058c <+8>:  auipc  a0, 121               <--|
    0x10590 <+12>: ld     a0, -980(a0)          <--| as expected
(lldb)

Then you can set breakpoints and go to where you want.


I copied some DWARF info that may be useful:

in cross-build elf file:

.debug_info

COMPILE_UNIT<header overall offset = 0x00000000>:
< 0><0x0000000c>  DW_TAG_compile_unit
                    DW_AT_stmt_list             0x00000000
                    DW_AT_low_pc                0x00010528
                    DW_AT_high_pc               <offset-from-lowpc>0x00000042
                    DW_AT_name                  ../sysdeps/riscv/start.S
                    DW_AT_comp_dir              /home/emmmer/git/riscv-gnu-toolchain/glibc/csu
                    DW_AT_producer              GNU AS 2.38
                    DW_AT_language              DW_LANG_Mips_Assembler

in native-build elf file:

.debug_info

COMPILE_UNIT<header overall offset = 0x00000000>:
< 0><0x0000000b>  DW_TAG_compile_unit
                    DW_AT_producer              GNU C17 10.3.1 -march=rv64imafd -mabi=lp64d -g
                    DW_AT_language              DW_LANG_C99
                    DW_AT_name                  tzb.c
                    DW_AT_comp_dir              /root
                    DW_AT_low_pc                0x000106ac
                    DW_AT_high_pc               <offset-from-lowpc>188
                    DW_AT_stmt_list             0x00000000
DavidSpickett added inline comments.Aug 31 2022, 6:13 AM
lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
54

Is it worth putting an enable_if sizeof(T) <= 4 here? Just to prevent some potential silly mistakes.

56

I'm guessing that the int32->int64 could just be one cast to int64 but it does document the widening nicely so keep it.

66

This looks a lot like SextW but perhaps needing the shift means you can just call that here. In fact there are a few below as well.

185–416

Prefer EmulateInstructionRISCV& to pointers when you know it will never be null. A lot of existing code in this area passes a pointer but very few paths seem to ever handle nullptr.

(and if you have to take the address of the ref somewhere to fit the API that's fine but at least this new code can be a bit easier to reason about)

189

I don't know the riscv LUI but I used the mips one a lot, shouldn't this be setting the upper bits of the register. Where the register is 64 bits meaning the upper 32?

Maybe it doesn't work that way for 64 bit. On the surface I'd expect a << 32 here somewhere.

200

You could enforce that with https://en.cppreference.com/w/cpp/types/is_integral, or std::is_same if you have just 1/2 types you expect.

232

What is E here?

323

An alternative way to do this:

uint64_t result = int64_t(value.GetAsUInt64()) < int64_t(imm);
return WriteRegister(emulator, DecodeRD(inst), RegisterValue(result));

Saves some repetition.

(and I see you did this later, great minds)

417–418

static const, just for correctness.

449

Is this mask right?

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value in
register rs1 by the shift amount held in the lower 5 bits of register rs2.

From The RISC-V Instruction Set Manual Volume I: User-Level ISA Document Version 2.2.

Unless "5" means position 5 through position 0 inclusive instead of a group of 5 bits.

But then later

SLL, SRL, and SRA perform logical left, logical right, and arithmetic right shifts on the value
in register rs1 by the shift amount held in register rs2. In RV64I, only the low 6 bits of rs2 are
considered for the shift amount.

Is it 64 bit uses 6 and 32 bit uses 5?

736

So what I get from this loop is that any instruction we have a way to emulate, is an instruction that could include the PC as its destination register. Do I have that correct?

lldb/unittests/Instruction/RISCV/TestRISCVEmulator.cpp
200–231

This checks the decoding but do we also want to check that they apply their results correctly?

As in, if I shift left something into the PC it should change the value of the PC. That sort of thing.

tzb99 added a comment.Aug 31 2022, 2:12 PM

Upon this change, My lldb logging changes a bit. The instruction is not shown anymore, but stepping is still working. Is it as expected?

* thread #1, name = 'test.o', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010528

I tried to reproduce but failed.
Could you please provide more information?
(for example: what is written in main, why use .o for debugging, and the command operations executed in lldb)

Sure. My testing script is:

#include<stdio.h>
void fun2() {
	printf("Enter fun2\n");
}
int fun1() {
	printf("Enter fun1\n");
	fun2();
	return 10;}

int main(int argc, char **argv) {
	int a = fun1();
	int b = 15;
       	printf("hello world\n");
	return 0;
}

So which file format should be used for debugging? My command option on the client side is:

lldb-server g 0.0.0.0:2345 test.o

On the lldb (x86) side is:

lldb
gdb-remote 2345
s

The objdump of my test.o file is:

0000000000010528 <_start>:
   10528:       02e000ef                jal     ra,10556 <load_gp>
   1052c:       87aa                    mv      a5,a0
   1052e:       00000517                auipc   a0,0x0
   10532:       14650513                addi    a0,a0,326 # 10674 <main>
   10536:       6582                    ld      a1,0(sp)
   10538:       0030                    addi    a2,sp,8
   1053a:       ff017113                andi    sp,sp,-16
   1053e:       00000697                auipc   a3,0x0
   10542:       68068693                addi    a3,a3,1664 # 10bbe <__libc_csu_init>
   10546:       00000717                auipc   a4,0x0

My compilation flavor should be:

riscv64-unknown-linux-gnu-gcc -g -o test.o hello.c \
-static -static-libgcc -static-libstdc++ -march=rv64imafd

Please let me know if you need more information!

I coiped your test code to tzb.c

(on x86)

$ riscv64-unknown-linux-gnu-gcc -g -o tzb tzb.c \
  -static -static-libgcc -static-libstdc++ -march=rv64imafd

PS: -o usage is: -o <file>, it means place the output into <file>.


(on qemu-system-riscv64)

$ lldb
(lldb) gdb-remote 8888

(lldb) Process 5933 stopped
* thread #1, name = 'tzb', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010528 tzb`_start at start.S:50
warning: This version of LLDB has no plugin for the language "assembler". Inspection of frame variables will be limited.

Then I tried to read DWARF in tzb, and dwarfdump report DW_AT_language is DW_LANG_Mips_Assembler, but DW_AT_language should be DW_LANG_C99.


It may be a bug caused by riscv-gnu-toolchain, and we can use this to avoid it.

(on qemu-system-riscv64)

$ gcc -g -o tzb tzb.c \
  -static -static-libgcc -static-libstdc++ -march=rv64imafd

$ lldb-server g 0.0.0.0:2345 tzb
Launched 'tzb' as process 6839...
lldb-server-local_build

(Switch to another tab)

$ lldb
(lldb) gdb-remote 2345

(lldb) Process 6839 stopped
* thread #1, name = 'tzb', stop reason = signal SIGSTOP
    frame #0: 0x0000000000010584 tzb`_start
tzb`_start:
->  0x10584 <+0>:  jal    48                    <--|
    0x10588 <+4>:  mv     a5, a0                <--|
    0x1058c <+8>:  auipc  a0, 121               <--|
    0x10590 <+12>: ld     a0, -980(a0)          <--| as expected
(lldb)

Then you can set breakpoints and go to where you want.


I copied some DWARF info that may be useful:

in cross-build elf file:

.debug_info

COMPILE_UNIT<header overall offset = 0x00000000>:
< 0><0x0000000c>  DW_TAG_compile_unit
                    DW_AT_stmt_list             0x00000000
                    DW_AT_low_pc                0x00010528
                    DW_AT_high_pc               <offset-from-lowpc>0x00000042
                    DW_AT_name                  ../sysdeps/riscv/start.S
                    DW_AT_comp_dir              /home/emmmer/git/riscv-gnu-toolchain/glibc/csu
                    DW_AT_producer              GNU AS 2.38
                    DW_AT_language              DW_LANG_Mips_Assembler

in native-build elf file:

.debug_info

COMPILE_UNIT<header overall offset = 0x00000000>:
< 0><0x0000000b>  DW_TAG_compile_unit
                    DW_AT_producer              GNU C17 10.3.1 -march=rv64imafd -mabi=lp64d -g
                    DW_AT_language              DW_LANG_C99
                    DW_AT_name                  tzb.c
                    DW_AT_comp_dir              /root
                    DW_AT_low_pc                0x000106ac
                    DW_AT_high_pc               <offset-from-lowpc>188
                    DW_AT_stmt_list             0x00000000

Thank you very much for the investigation! The transition from riscv64-unknown-linux-gnu-gcc to gcc on qemu did improved. I can also set breakpoint on the designated memory address (failed to read the function names tho (main, etc)). Another issue I encountered (with this current patch, and same issue happened before) is that, if I kept executing "stepping" command, the ret (jalr) instruction cannot be executed properly and then the program entered into the illegal instruction region. Have you encountered the same issue by keeping stepping?

Emmmer updated this revision to Diff 457243.Sep 1 2022, 6:05 AM
Emmmer marked 9 inline comments as done.
Emmmer edited the summary of this revision. (Show Details)

address review comments

lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
66

When bit operation happens, inst is converted to int64, so we can't use SextW here directly.

449

Yes. The shamt field in RV32 is 5 bits, and The shamt field in RV64 is 6 bits.

736

Just implementing some branch instructions is not enough to debug some projects (like debugging atomic locks), so I added more instructions.

DavidSpickett added inline comments.Sep 1 2022, 6:28 AM
lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
736

Right so this isn't just about instructions that branch (which is my mistake). Whenever we place a software breakpoint we're emulating the instruction we overwrote regardless of branching.

lldb/unittests/Instruction/RISCV/TestRISCVEmulator.cpp
166

Seems better to inline this below, I only see one call to it.

210

Please call this something other than map, tests perhaps? (it's a nit but at a glance you see oh its a map but oh wait it's in fact a vector)

Emmmer marked an inline comment as done.Sep 1 2022, 6:30 AM
Emmmer added inline comments.
lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
189

The spec says:

LUI places the 32-bit U-immediate into register rd, filling in the lowest 12 bits with zeros. The 32-bit results sign-extended to 64 bits.

Emmmer updated this revision to Diff 457253.Sep 1 2022, 6:42 AM
Emmmer marked an inline comment as done.

address review comments

Emmmer marked 2 inline comments as done.Sep 1 2022, 6:44 AM
Emmmer added a comment.Sep 1 2022, 6:54 AM

Thank you very much for the investigation! The transition from riscv64-unknown-linux-gnu-gcc to gcc on qemu did improved. I can also set breakpoint on the designated memory address (failed to read the function names tho (main, etc)). Another issue I encountered (with this current patch, and same issue happened before) is that, if I kept executing "stepping" command, the ret (jalr) instruction cannot be executed properly and then the program entered into the illegal instruction region. Have you encountered the same issue by keeping stepping?

I didn't run into the illegal instruction region, but I did find an infinity loop(about atomic lock), due to the lack of RVA extension.

110cc:	100427af          	lr.w	a5,(s0)
110d0:	00079663          	bnez	a5,110dc <__run_exit_handlers+0x70>
110d4:	1ce426af          	sc.w.aq	a3,a4,(s0)
110d8:	fe069ae3          	bnez	a3,110cc <__run_exit_handlers+0x60>

sc.w.aq is going to set a3 to zero, if it didn't succeed, the bnez a3,110cc will jump to 110cc, causing an infinity loop.

DavidSpickett added inline comments.Sep 1 2022, 7:09 AM
lldb/source/Plugins/Instruction/RISCV/EmulateInstructionRISCV.cpp
189

Got it, I was expecting the LUI to "scale up" to 64 bit and be loading the upper 32 bits but that is not the case thanks for explaining.

DavidSpickett accepted this revision.Sep 1 2022, 7:10 AM

This LGTM.

This revision is now accepted and ready to land.Sep 1 2022, 7:10 AM