This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][CodeGen] Support Zdinx on RV32 codegen
ClosedPublic

Authored by sunshaoce on May 3 2023, 6:12 AM.

Diff Detail

Event Timeline

sunshaoce created this revision.May 3 2023, 6:12 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 3 2023, 6:12 AM
sunshaoce requested review of this revision.May 3 2023, 6:12 AM

In D122918, @jrct27 said

I don't know why you've marked this as done; the line in question (which is now the line below) is not correct, the immediate could be 0x7f[cdef] at which point the arithmetic here will wrap to 0x80[0123] (which the machine verifier probably catches as out of range due to the immediate being signed?..). That is:

void
foo(void *p, double d)
{
    *(double *)((char *)p + 0x7fc) = d;
}

will be miscompiled, surely.

I tested it now. It seems that the bug has already been fixed.

$ bin/clang -cc1 -triple riscv32 -target-feature +zdinx -S test.c -O3 

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	addi	a3, a0, 2044
	sw	a1, 2044(a0)
	sw	a2, 4(a3)
	ret
$ bin/clang -cc1 -triple riscv32 -S test.c -O3

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	addi	a3, a0, 2044
	sw	a1, 2044(a0)
	sw	a2, 4(a3)
	ret
$ bin/clang -cc1 -triple riscv32 -target-feature +d -S test.c -O3

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	fsd	fa0, 2044(a0)
	ret

Any suggestions?@jrtc27

In D122918, @jrct27 said

I don't know why you've marked this as done; the line in question (which is now the line below) is not correct, the immediate could be 0x7f[cdef] at which point the arithmetic here will wrap to 0x80[0123] (which the machine verifier probably catches as out of range due to the immediate being signed?..). That is:

void
foo(void *p, double d)
{
    *(double *)((char *)p + 0x7fc) = d;
}

will be miscompiled, surely.

I tested it now. It seems that the bug has already been fixed.

$ bin/clang -cc1 -triple riscv32 -target-feature +zdinx -S test.c -O3 

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	addi	a3, a0, 2044
	sw	a1, 2044(a0)
	sw	a2, 4(a3)
	ret
$ bin/clang -cc1 -triple riscv32 -S test.c -O3

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	addi	a3, a0, 2044
	sw	a1, 2044(a0)
	sw	a2, 4(a3)
	ret
$ bin/clang -cc1 -triple riscv32 -target-feature +d -S test.c -O3

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	fsd	fa0, 2044(a0)
	ret

Any suggestions?@jrtc27

At the IR level (and DAG level) is it actually doing an f64 store? I can’t see how your code would ever work for that case. I also don’t know if unaligned accesses of globalisation needs to work, because if so your code is similarly broken there.

In D122918, @jrct27 said

I don't know why you've marked this as done; the line in question (which is now the line below) is not correct, the immediate could be 0x7f[cdef] at which point the arithmetic here will wrap to 0x80[0123] (which the machine verifier probably catches as out of range due to the immediate being signed?..). That is:

void
foo(void *p, double d)
{
    *(double *)((char *)p + 0x7fc) = d;
}

will be miscompiled, surely.

I tested it now. It seems that the bug has already been fixed.

$ bin/clang -cc1 -triple riscv32 -target-feature +zdinx -S test.c -O3 

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	addi	a3, a0, 2044
	sw	a1, 2044(a0)
	sw	a2, 4(a3)
	ret
$ bin/clang -cc1 -triple riscv32 -S test.c -O3

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	addi	a3, a0, 2044
	sw	a1, 2044(a0)
	sw	a2, 4(a3)
	ret
$ bin/clang -cc1 -triple riscv32 -target-feature +d -S test.c -O3

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	fsd	fa0, 2044(a0)
	ret

Any suggestions?@jrtc27

At the IR level (and DAG level) is it actually doing an f64 store? I can’t see how your code would ever work for that case. I also don’t know if unaligned accesses of globalisation needs to work, because if so your code is similarly broken there.

I think returning an f64 load will get bitcasted by the ABI which will convert the load to an i64 load which will go to type legalization. You need to test a load followed by an FP operation like fadd.

sunshaoce updated this revision to Diff 519321.May 3 2023, 6:25 PM
sunshaoce updated this revision to Diff 519333.May 3 2023, 7:18 PM

Update unpackF64OnRV32DSoftABI, do not use RISCVISD::BuildPairF64 on zdinx, but use build_pair + bitcast.
I forgot to submit this before, now we can get:

$ bin/clang -cc1 -triple riscv32 -target-feature +zdinx -S test.c -O3 

foo:                                    # @foo
.Lfoo$local:
	.type	.Lfoo$local,@function
# %bb.0:                                # %entry
	addi	a3, a0, 2044
	sw	a1, 2044(a0)
	sw	a2, 4(a3)
	ret
sunshaoce updated this revision to Diff 519382.May 4 2023, 1:29 AM

Add a test from *(double *)((char *)p + 0x7fc) = d;

This test fails the verifier

define void @foo(ptr nocapture %p, double %d) {                                                                                          
entry:                                                                           
  %a = fadd double %d, %d                                                        
  %add.ptr = getelementptr inbounds i8, ptr %p, i64 2044                         
  store double %a, ptr %add.ptr, align 8                                         
  ret void                                                                       
}
craig.topper requested changes to this revision.May 4 2023, 3:59 PM
This revision now requires changes to proceed.May 4 2023, 3:59 PM
liaolucy added inline comments.May 4 2023, 9:38 PM
llvm/lib/Target/RISCV/RISCVInstrInfoD.td
519

We might add def simm12_sub4, the range is [-2048,2023]. Then replace simm12 with simm12_sub4.

jrtc27 added inline comments.May 4 2023, 11:40 PM
llvm/lib/Target/RISCV/RISCVInstrInfoD.td
519

That doesn’t help you, globals etc are still broken.

liaolucy added inline comments.May 5 2023, 7:37 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
284

I am trying another solution, help to see, will it have any potential problems? @jrtc27, thanks.

else {
    Register TmpReg = MBBI->getOperand(1).getReg();
   BuildMI(MBB, MBBI, DL, TII->get(RISCV::ADDI), TmpReg)
        .add(MBBI->getOperand(1))
        .addImm(4);
   BuildMI(MBB, MBBI, DL, TII->get(RISCV::SW))
       .addReg(Hi, getKillRegState(MBBI->getOperand(0).isKill()))
       .add(MBBI->getOperand(1))
       .add(MBBI->getOperand(2));
 }
liaolucy added inline comments.May 5 2023, 7:52 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
276

The global processing is in this if. Maybe there is something unsafe to handle here? Please help with some suggestions.

sunshaoce added inline comments.May 5 2023, 8:33 AM
llvm/lib/Target/RISCV/RISCVInstrInfoD.td
519

Can this code test globals?

double d = 42;
void foo(char *p) { *(double *)(p + 0x7fc) = d; }
craig.topper added inline comments.May 5 2023, 11:12 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
278

This doesn't work. The load/store is paired with an LUI. The offset in the LUI needs to match. This just moved the overflow to the linker.

liaolucy added inline comments.May 5 2023, 7:26 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
284

This is also wrong. There are some tests that still fail. Thanks.

Maybe we shouldn't expand the Pseudo in RISCVExpandPseudoInsts.cpp.
It seems that one more MI would get the following error.

llc: /home/liaochunyu/llvm-project/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:83: virtual bool (anonymous namespace)::RISCVExpandPseudo::runOnMachineFunction(llvm::MachineFunction &): Assertion `OldSize >= NewSize' failed.

Maybe we should learn from jrtc27's reference https://github.com/CTSRD-CHERI/llvm-project/pull/635/files

Maybe we shouldn't expand the Pseudo in RISCVExpandPseudoInsts.cpp.
It seems that one more MI would get the following error.

llc: /home/liaochunyu/llvm-project/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:83: virtual bool (anonymous namespace)::RISCVExpandPseudo::runOnMachineFunction(llvm::MachineFunction &): Assertion `OldSize >= NewSize' failed.

That error means you're missing a let Size = 8 on the pseudo in tablegen.

Maybe we shouldn't expand the Pseudo in RISCVExpandPseudoInsts.cpp.
It seems that one more MI would get the following error.

llc: /home/liaochunyu/llvm-project/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:83: virtual bool (anonymous namespace)::RISCVExpandPseudo::runOnMachineFunction(llvm::MachineFunction &): Assertion `OldSize >= NewSize' failed.

That error means you're missing a let Size = 8 on the pseudo in tablegen.

Here, it looks like there is

let isCall = 0, mayLoad = 1, mayStore = 0, Size = 8, isCodeGenOnly = 1 in
def PseudoRV32ZdinxLD : Pseudo<(outs GPRPF64:$dst), (ins GPR:$rs1, simm12:$imm12), []>;
defm : LdPat<load, PseudoRV32ZdinxLD, f64>;

/// Stores
let isCall = 0, mayLoad = 0, mayStore = 1, Size = 8, isCodeGenOnly = 1 in
def PseudoRV32ZdinxSD : Pseudo<(outs), (ins GPRPF64:$rs2, GPR:$rs1, simm12:$imm12), []>;
defm : StPat<store, PseudoRV32ZdinxSD, GPRPF64, f64>;

Maybe we shouldn't expand the Pseudo in RISCVExpandPseudoInsts.cpp.
It seems that one more MI would get the following error.

llc: /home/liaochunyu/llvm-project/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:83: virtual bool (anonymous namespace)::RISCVExpandPseudo::runOnMachineFunction(llvm::MachineFunction &): Assertion `OldSize >= NewSize' failed.

That error means you're missing a let Size = 8 on the pseudo in tablegen.

Maybe I don’t know you meant by one more MI

Maybe we shouldn't expand the Pseudo in RISCVExpandPseudoInsts.cpp.
It seems that one more MI would get the following error.

llc: /home/liaochunyu/llvm-project/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:83: virtual bool (anonymous namespace)::RISCVExpandPseudo::runOnMachineFunction(llvm::MachineFunction &): Assertion `OldSize >= NewSize' failed.

That error means you're missing a let Size = 8 on the pseudo in tablegen.

Maybe I don’t know you meant by one more MI

Maybe we shouldn't expand the Pseudo in RISCVExpandPseudoInsts.cpp.
It seems that one more MI would get the following error.

llc: /home/liaochunyu/llvm-project/llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp:83: virtual bool (anonymous namespace)::RISCVExpandPseudo::runOnMachineFunction(llvm::MachineFunction &): Assertion `OldSize >= NewSize' failed.

That error means you're missing a let Size = 8 on the pseudo in tablegen.

Maybe I don’t know you meant by one more MI

One more ADDI than before.

else {
    Register TmpReg = MBBI->getOperand(1).getReg();
   BuildMI(MBB, MBBI, DL, TII->get(RISCV::ADDI), TmpReg)
        .add(MBBI->getOperand(1))
        .addImm(4);
   BuildMI(MBB, MBBI, DL, TII->get(RISCV::SW))
       .addReg(Hi, getKillRegState(MBBI->getOperand(0).isKill()))
       .add(MBBI->getOperand(1))
       .add(MBBI->getOperand(2));
 }

but the test fail:

define void @foo(ptr nocapture %p, double %d) {                                                                                          
entry:                                                                           
  %a = fadd double %d, %d                                                        
  %add.ptr = getelementptr inbounds i8, ptr %p, i64 2044                         
  store double %a, ptr %add.ptr, align 8                                         
  ret void                                                                       
}

Guess it's because of add one more MI.

It suddenly occurred to me that might not be a problem with size. Maybe my global didn't handle it well, then has this error,thanks, thanks!

craig.topper added inline comments.May 5 2023, 11:16 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
278

I guess if the global is 8 byte aligned, the +4 won't cause any wrap.

liaolucy added inline comments.May 6 2023, 12:57 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
278

The current test affected by this code is fld_fsd_global in double-mem.ll.

sunshaoce updated this revision to Diff 520040.May 6 2023, 1:17 AM

Add more tests in zdinx-rv32.ll

craig.topper added inline comments.May 6 2023, 12:09 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
274

You don't need the getKillRegState call here. Also it's argument is a bool so passing 0 to it was questionable.

292

Operand 1 isn't available to clobber here unless the kill flag is set. If's not set you would need to subtract the +4 after the store.

craig.topper added inline comments.May 6 2023, 12:12 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
292

It's also possible that operand 1 is X0. I think that could occur for a global with an absolute address of 0x7fc.

liaolucy added inline comments.May 6 2023, 5:28 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
292

def GPRNoX0 --- new register class without X0

def PseudoRV32ZdinxSD : Pseudo<(outs), (ins GPRPF64:$rs2, GPR:$rs1, simm12:$imm12), []>;

to:

def PseudoRV32ZdinxSD : Pseudo<(outs), (ins GPRPF64:$rs2, GPRNoX0:$rs1, simm12:$imm12), []>;

Can this solve the problem?(I feel like I'm making the same mistake as the previous one.)

sunshaoce updated this revision to Diff 520259.May 7 2023, 11:05 PM
sunshaoce marked 3 inline comments as done.

Address comments

craig.topper added inline comments.May 9 2023, 10:04 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
278

f64 loads are not guaranteed to be a 8 byte aligned the unaligned-scalar-mem subtarget feature is used.

291

As I said before, we can't modify the operand 1 register if it is used after the store.

321

Lo could be the same register as operand 1, in which case you can't use it here because it was overwritten.

sunshaoce updated this revision to Diff 521109.May 10 2023, 2:45 PM
sunshaoce marked an inline comment as done and an inline comment as not done.
  1. By swapping the order of loading Lo and Hi, the issue when Operand 1 is equal to Lo can be resolved.
  2. The issue of overflow during loading has been fixed, and a test foo4 has been added in zdinx-rv32.ll.
  3. FIXMEs has been added for the lack of support in zdinx rv32 for unaligned memory.
craig.topper added inline comments.May 10 2023, 9:22 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
281

I dont' think the offset is guaranteed to be 0. RISCVDAGToDAGISel::SelectAddrRegImm can update the offset when the address is formed. I think you can assume getOffset() % 8 == 0 which is enough for the +4 to not overflow.

298

It's possible that operand 1 and Hi are the same register. If they are, the +4 to operand 1 will cause Hi to be the wrong value when you store it.

StephenFan added inline comments.May 11 2023, 12:43 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
279

I don't think we need to assert unaligned scalar memory feature is not enabled. Since RISCVDAGToDAGISel::SelectAddrRegImm has ensured that the global value's alignment must be greater than the result of getOffset() even when the unaligned scalar memory feature is enabled.

sunshaoce marked 2 inline comments as done.
  1. Use getOffset() % 8 == 0
  2. Use Lo instead of Operand 1 in BuildMI(MBB,MBBI,DL,TII->get(RISCV::SW))
craig.topper added inline comments.May 11 2023, 11:31 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
302

Doesn't this need to use Lo as its input?

llvm/test/CodeGen/RISCV/zdinx-rv32.ll
17 ↗(On Diff #521220)

As I noted in the source code, this should be addi a2, a2, -4

sunshaoce updated this revision to Diff 521533.May 11 2023, 7:16 PM
sunshaoce marked 2 inline comments as done.

Address comments

StephenFan added inline comments.May 11 2023, 8:00 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
288
350

if MBBI->getOperand(1).isKill() is true, maybe we don't need to add back -4?

jrtc27 added inline comments.May 11 2023, 8:24 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
350

This is just silly; expand it during ISel and let the register allocator figure out if it can reuse the register or needs to allocate an extra one

350

Also surely that would let the existing peephole passes work on optimising the simple expanded form into the more compact form when the immediate fits?

StephenFan added inline comments.May 11 2023, 10:12 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
350

I agree with @jrtc27 that maybe we can try to custom lowering f64 load store during ISel.

sunshaoce updated this revision to Diff 522609.May 16 2023, 7:30 AM
sunshaoce marked an inline comment as done.
  1. Added instructions removed from Zhinx
  2. Revised SelectAddrRegImm to optimize zdinx-rv32.ll
sunshaoce updated this revision to Diff 522613.May 16 2023, 7:38 AM

Use LdPat/StPat in load/store again

craig.topper added inline comments.May 16 2023, 6:08 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2328

Does this fail UBSAN if CVal is 0x7fffffffffffffff?

craig.topper added inline comments.May 16 2023, 8:29 PM
llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2325

This affects all loads and stores, not just the f64 case.

sunshaoce updated this revision to Diff 522943.May 17 2023, 1:37 AM

Use if (isInt<12>(CVal) && isInt<12>(CVal + ZdinxRange)) in SelectAddrRegImm

sunshaoce updated this revision to Diff 522987.May 17 2023, 3:44 AM
sunshaoce marked an inline comment as done.

Use SelectAddrRegImmINX in Zdinx RV32

evandro removed a subscriber: evandro.May 17 2023, 3:51 PM
craig.topper added inline comments.May 22 2023, 12:42 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
321

This assert should be Offset %8 == 0 like it is in the store case.

llvm/lib/Target/RISCV/RISCVISelDAGToDAG.cpp
2403

Do not copy and paste the entire original function with only a small change. Add a boolean flag to the existing function and change only the behavior that needs to change.

sunshaoce updated this revision to Diff 524549.May 22 2023, 6:36 PM
sunshaoce marked 2 inline comments as done.

Address comments. Thanks!

This revision is now accepted and ready to land.May 24 2023, 10:21 AM
This revision was automatically updated to reflect the committed changes.
chapuni added inline comments.
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
278

STI is used only in +Asserts.

jrtc27 added a comment.EditedMay 25 2023, 1:56 AM

I don’t see any tests for loading from globals and constant pools, so there’s no coverage of that code?

(EDIT: Globals are indeed there, but only for position-dependent medlow code, not sure why my search for %lo earlier didn't work)

sunshaoce marked an inline comment as done.EditedMay 25 2023, 9:44 PM

STI is used only in +Asserts.

Thanks! It has been fixed in D151435.

I don’t see any tests for loading from globals and constant pools, so there’s no coverage of that code?

(EDIT: Globals are indeed there, but only for position-dependent medlow code, not sure why my search for %lo earlier didn't work)

I added a test for Constant Pool in D151534. If there are any tests that haven't been covered yet, can you help provide them?