This is an archive of the discontinued LLVM Phabricator instance.

[RISCV][CodeGen] Support Zfinx,Zdinx,Zhinx,Zhinxmin codegen
AbandonedPublic

Authored by sunshaoce on Apr 1 2022, 10:33 AM.

Diff Detail

Event Timeline

sunshaoce created this revision.Apr 1 2022, 10:33 AM
Herald added a project: Restricted Project. · View Herald TranscriptApr 1 2022, 10:33 AM
sunshaoce requested review of this revision.Apr 1 2022, 10:33 AM
sunshaoce updated this revision to Diff 420063.Apr 3 2022, 9:05 AM

Support float-arith in RV64 Zfinx

sunshaoce edited the summary of this revision. (Show Details)Apr 3 2022, 9:06 AM
sunshaoce retitled this revision from [RISCV][CodeGen] Support float-arith in RV32 zfinx to [RISCV][CodeGen] Support float-arith in Zfinx.
sunshaoce added subscribers: realqhc, liaolucy.
gabcoh added a subscriber: gabcoh.Apr 3 2022, 10:49 AM
sunshaoce updated this revision to Diff 420709.Apr 5 2022, 10:19 PM

Add abi info in test files

hughperkins added a subscriber: hughperkins.EditedApr 12 2022, 3:35 PM

Note: I get a crash when I try to compile some LLVM IR. Crash trace:

$ bin/llc --march riscv32 -mattr +zfinx test.ll
LLVM ERROR: Cannot select: t8: ch = store<(store (s32) into %ir.4)> t7, t4, t6, undef:i32
  t4: f32,ch = load<(dereferenceable load (s32) from %ir.2)> t0, FrameIndex:i32<1>, undef:i32
    t1: i32 = FrameIndex<1>
    t3: i32 = undef
  t6: i32,ch = load<(dereferenceable load (s32) from %ir.1, align 8)> t0, FrameIndex:i32<0>, undef:i32
    t5: i32 = FrameIndex<0>
    t3: i32 = undef
  t3: i32 = undef
In function: _Z10sum_floatsPfjS_
PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace.
Stack dump:
0.	Program arguments: bin/llc --march riscv32 -mattr +zfinx test.ll
1.	Running pass 'Function Pass Manager' on module 'test.ll'.
2.	Running pass 'RISCV DAG->DAG Pattern Instruction Selection' on function '@_Z10sum_floatsPfjS_'
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  llc                      0x000000010974962d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 61
1  llc                      0x0000000109749bdb PrintStackTraceSignalHandler(void*) + 27
2  llc                      0x000000010974785b llvm::sys::RunSignalHandlers() + 139
3  llc                      0x000000010974b7d8 SignalHandler(int) + 232
4  libsystem_platform.dylib 0x00007fff206e5d7d _sigtramp + 29
5  libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603339972059808
6  libsystem_c.dylib        0x00007fff205f5406 abort + 125
7  llc                      0x00000001095aae4a llvm::report_fatal_error(llvm::Twine const&, bool) + 394
8  llc                      0x0000000109432305 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) + 725
9  llc                      0x000000010942dd72 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) + 26690
10 llc                      0x0000000106faec6c llvm::RISCVDAGToDAGISel::SelectCode(llvm::SDNode*) + 44
11 llc                      0x0000000106fade73 llvm::RISCVDAGToDAGISel::Select(llvm::SDNode*) + 31299
12 llc                      0x000000010941e7fd llvm::SelectionDAGISel::DoInstructionSelection() + 1629
13 llc                      0x000000010941cf4d llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 6205
14 llc                      0x000000010941b56f llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, true>, llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, true>, bool&) + 399
15 llc                      0x0000000109419649 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 6329
16 llc                      0x0000000109415d2c llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 2268
17 llc                      0x0000000106fb0668 llvm::RISCVDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 56
18 llc                      0x0000000107cc0dcd llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 541
19 llc                      0x0000000108645dbc llvm::FPPassManager::runOnFunction(llvm::Function&) + 700
20 llc                      0x000000010864dce5 llvm::FPPassManager::runOnModule(llvm::Module&) + 117
21 llc                      0x0000000108646784 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) + 772
22 llc                      0x00000001086462a8 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 296
23 llc                      0x000000010864e081 llvm::legacy::PassManager::run(llvm::Module&) + 33
24 llc                      0x0000000106eeba16 compileModule(char**, llvm::LLVMContext&) + 6742
25 llc                      0x0000000106ee977d main + 1501
26 libdyld.dylib            0x00007fff206bbf3d start + 1
27 libdyld.dylib            0x0000000000000006 start + 18446603339972231370
Abort trap: 6

Corresponding LLVM IR:

define dso_local void @_Z10sum_floatsPfjS_() #0 {
  %1 = alloca float*, align 8
  %2 = alloca float, align 4
  %3 = load float, float* %2, align 4
  %4 = load float*, float** %1, align 8
  store float %3, float* %4, align 4
  ret void
}

(Note that it compiles just fine with +f (but using float registers...), or with nothing (but uses softwawre emulation for the fadd)

(Edited to shorten llvm ir code a lot :) )

hughperkins added a comment.EditedApr 13 2022, 2:41 AM

Ok, I've since learned that this PR doesnt handle load/store. Only arithmetic.

However note that immediate float arithmetic also seems to be missing, e.g. the following crashes

define noundef float @_Z4funcf(float noundef %0) #0 {
  %2 = fadd float %0, 3.000000e+00
  ret float %2
}

Compiled with:

bin/llc --march riscv32 -mattr +zfinx test_fadd_imm.ll

Gives:

LLVM ERROR: Cannot select: t15: f32,ch = load<(load (s32) from constant-pool)> t0, t19, undef:i32
  t19: i32 = ADDI t18, TargetConstantPool:i32<float 3.000000e+00> 0 [TF=3]
    t18: i32 = LUI TargetConstantPool:i32<float 3.000000e+00> 0 [TF=4]
      t16: i32 = TargetConstantPool<float 3.000000e+00> 0 [TF=4]
    t17: i32 = TargetConstantPool<float 3.000000e+00> 0 [TF=3]
  t14: i32 = undef
In function: _Z4funcf

(but compiles ok with +f)

hughperkins added inline comments.Apr 13 2022, 3:39 AM
llvm/lib/Target/RISCV/RISCVInstrInfoF.td
515

I think it's confusing that FINX in this context means F and F_INX. I feel it might be good to make more explicit that FINX includes F. How to do that is an open question. I'd suggest FFINX, but FFINX is already taken to mean FF with FF_INX. Maybe we need an explicit acronym to mean 'F and F_INX', like e.g. 'ALLF'. Or perhaps use a verbose form like 'F_AND_FINX'? (If it was my own decision I'd go for the explicit F_AND_FINX)

Also, not really part of this PR, but I don't know where else to put this. This code:

def FF        : ExtInfo_rr<FExt,       FPR32,    FPR32>;
def FF_INX    : ExtInfo_rr<ZfinxExt,   FPR32INX, FPR32INX>;
def FX        : ExtInfo_rr<FExt,       FPR32,    GPR>;
def FX_INX    : ExtInfo_rr<ZfinxExt,   FPR32INX, GPR>;
def FX_64     : ExtInfo_rr<F64Ext,     FPR32,    GPR>;
def FX_INX_64 : ExtInfo_rr<Zfinx64Ext, FPR32INX, GPR>;
def XF        : ExtInfo_rr<FExt,       GPR,      FPR32>;
def XF_64     : ExtInfo_rr<F64Ext,     GPR,      FPR32>;
def XF_INX    : ExtInfo_rr<ZfinxExt,   GPR,      FPR32INX>;
def XF_INX_64 : ExtInfo_rr<Zfinx64Ext, GPR,      FPR32INX>;

... I feel it would be good to add a docstring explaining what this is, like e.g. something like:

Each of these definitions represents a single combination of register spaces.
For example, `FX` means that the destination register is a float register (`FPR32`), e.g. `ft1`,
and that the source register is a general `x` register (`GPR`), e.g. `x1`.
The `_INX` versions are for the `+zfinx` extension. In this case, `FPR32INX` is effectively a synonym in practice
for `GPR`. At least `FPR32INX` are emitted as general `x` registers, even though they 'look like' float registers
for much of the tablegen process.

Ok, I've since learned that this PR doesnt handle load/store. Only arithmetic.

However note that immediate float arithmetic also seems to be missing, e.g. the following crashes

define noundef float @_Z4funcf(float noundef %0) #0 {
  %2 = fadd float %0, 3.000000e+00
  ret float %2
}

Compiled with:

bin/llc --march riscv32 -mattr +zfinx test_fadd_imm.ll

Gives:

LLVM ERROR: Cannot select: t15: f32,ch = load<(load (s32) from constant-pool)> t0, t19, undef:i32
  t19: i32 = ADDI t18, TargetConstantPool:i32<float 3.000000e+00> 0 [TF=3]
    t18: i32 = LUI TargetConstantPool:i32<float 3.000000e+00> 0 [TF=4]
      t16: i32 = TargetConstantPool<float 3.000000e+00> 0 [TF=4]
    t17: i32 = TargetConstantPool<float 3.000000e+00> 0 [TF=3]
  t14: i32 = undef
In function: _Z4funcf

(but compiles ok with +f)

Thanks, I'll try to fix it.

llvm/lib/Target/RISCV/RISCVInstrInfoF.td
515

Your suggestion is very good, but I think FINX should be replaced with a shorter name because it appears as a parameter, eg F_INX. And consider changing the name of F_INX to FprINX. Would you like to submit a new patch about this?

Thanks, I'll try to fix it.

Awesome :) Thank you :)

As a general comment, after thinking about this whilst brushing my teeth, I kind of feel that it might be clearer and simpler for everyone, both F people and Zfinx people, if we decoupled F and Zfinx into two separate files. This way F people don't have to figure out what is Zfinx. They just keep their clean separate F file. and Zfinx people dont have to think about how our bits mangle with F bits. Yes involves a bit of duplication, but if it's clearer/cleaner for everyone, maybe it might be a good approach?

After all, Zfinx is not an extension of F. It's a separate, mutually incompatible extension. And it's a fairly niche extension, I feel.

(Conversely, if we feel that zfinx is a non-niche extension, then it should have a shorter name I feel, eg maybe U for untyped registers)

asb added a comment.Apr 13 2022, 8:23 AM

As a general comment, after thinking about this whilst brushing my teeth, I kind of feel that it might be clearer and simpler for everyone, both F people and Zfinx people, if we decoupled F and Zfinx into two separate files. This way F people don't have to figure out what is Zfinx. They just keep their clean separate F file. and Zfinx people dont have to think about how our bits mangle with F bits. Yes involves a bit of duplication, but if it's clearer/cleaner for everyone, maybe it might be a good approach?

After all, Zfinx is not an extension of F. It's a separate, mutually incompatible extension. And it's a fairly niche extension, I feel.

Hi Hugh, and welcome! The naming of the extension is outside of our control I'm afraid.

I see your point, though the main reasoning about keeping F and Zfinx together is that they are largely the same instructions, and the hope is to define instructions and patterns together where feasible.

hughperkins added a comment.EditedApr 13 2022, 3:31 PM

I see your point, though the main reasoning about keeping F and Zfinx together is that they are largely the same instructions, and the hope is to define instructions and patterns together where feasible.

Yeah, I totally understand that. And when I first looked at this PR, I thought that made total sense. But then I saw how challenging it is to try to combine the two.

As far as the general principle of 'lets keep things together with similar patterns', the same argument can be levied against the pair of F and D, and yet they are in separate files, and I currently feel putting F and D in separate files is likely a reasonable idea. Despite their apparent similarities (32 vs 64 bits...), it is simpler and easier to keep D and F separate. Compared to D vs F, F vs Zfinx are I feel more different, and there is more reason to keep them apart. But even if they were the same difference as D and F, I feel keeping Zfinx and F apart coudl be a good idea:

  • Zfinx is fairly niche, and seems unfair to 'pollute' F with it?
  • I suspect Zfinx might be easier to implement if it is kept separaet fomr F.

Anyway, I seem likely to at least dabble in creating a decoupled version of Zfinx. I sort of have a hunch it will be eiaser. If it's not, well, I will come back, and say "well, yes, you were right, it is good to keep Zfinx and F together because ... [reasons X, Y, Z]".

asb added a comment.Apr 14 2022, 1:17 AM

@hughperkins: You make good points. It's not immediately obvious what kind of splitting would be better, and at what point the added complexity of trying to have piggy-back zfinx on current F/D patterns hampers readability and maintainability rather than enhance it. If you have time to play around with some options, I'd be really interested in hearing what you come up with.

Oooo. you've made load and store work :) Taht's fantastic :) Thank you :)

sunshaoce updated this revision to Diff 427929.May 8 2022, 6:17 AM

Fully support zfinx codegen

sunshaoce retitled this revision from [RISCV][CodeGen] Support float-arith in Zfinx to [RISCV][CodeGen] Support Zfinx codegen.May 8 2022, 6:19 AM

By the way, whilst I remember, since @sunshaoce has implemented all the functionality I need (for now), so I'm not intending to spend any time on working on this. Everything I tested in this PR worked great for me :)

sunshaoce updated this revision to Diff 427933.May 8 2022, 7:00 AM

Fix errors

sunshaoce marked an inline comment as done.

Support zdinx,zhinx codegen

sunshaoce retitled this revision from [RISCV][CodeGen] Support Zfinx codegen to [RISCV][CodeGen] Support Zfinx,Zdinx,Zhinx,Zhinxmin codegen.Jun 20 2022, 10:47 PM
reames requested changes to this revision.Aug 31 2022, 3:13 PM
reames added a subscriber: reames.

A couple of very high level review comments.

  1. This patch commingles the assembler/disassembly support and the pattern matching required for IR lowering. The typical process for an extension is to start with a patch specific to the MC layer (i.e. assembler/disassembler), and then move to codegen support in a later patch. I strongly strongly recommend you split if you want meaningful chance of review.
  1. You don't appear to have any assembler tests. That's a show stopper right there.
  1. On the MC part, please try to get the new extension as separate as possible from the existing floating point ones. I realize there's some potential for code savings, but the change will be lower risk if it touches as little as possible. Once it's in, and tested, changes to share code incrementally would be reasonable.

I strongly recommend you address these points. This patch is unlikely to get meaningful review without them addressed.

This revision now requires changes to proceed.Aug 31 2022, 3:13 PM

Assembler and disassembler support was already commited in https://reviews.llvm.org/D93298

craig.topper added inline comments.Aug 31 2022, 3:19 PM
llvm/lib/Target/RISCV/RISCVInstrInfoD.td
105 ↗(On Diff #456908)

Is this something that should have been in the MC layer patch?

jrtc27 added inline comments.Aug 31 2022, 3:25 PM
llvm/lib/Target/RISCV/RISCVInstrInfoD.td
105 ↗(On Diff #456908)

No. These don't exist. From the spec:

Load-pair and store-pair instructions are not provided, so transferring
double-precision operands in RV32Zdinx from or to memory requires
two loads or stores.

Assembler and disassembler support was already commited in https://reviews.llvm.org/D93298

Then why does this contain what look to be MC related changes to .td files? Looking closer, I do see that something has already landed, but there's definitely MC pieces in this. Confusing to say the least.

Assembler and disassembler support was already commited in https://reviews.llvm.org/D93298

Then why does this contain what look to be MC related changes to .td files? Looking closer, I do see that something has already landed, but there's definitely MC pieces in this. Confusing to say the least.

Can you point to a specific example?

llvm/lib/Target/RISCV/RISCVInstrInfoD.td
105 ↗(On Diff #456908)

Are these expanded somewhere that I'm not seeing?

craig.topper added inline comments.Aug 31 2022, 3:48 PM
llvm/test/CodeGen/RISCV/double-mem.ll
22 ↗(On Diff #456908)

This seems to be emitting an ld instruction in rv32 which is illegal.

jrtc27 added inline comments.Aug 31 2022, 3:51 PM
llvm/lib/Target/RISCV/RISCVInstrInfoD.td
105 ↗(On Diff #456908)

Nope, you can see them in the test output for the RV32Zdinx tests... it's just wrong. Whether these should be pseudos or we should be teaching the legaliser that f64 is legal except for loads/stores I don't know; the latter would be nice, but the interaction with spills sounds bad.

I had a similar issue downstream in CHERI-LLVM where, depending on the mode and pointer address space, there is no FLD/FSD available (when the pointer address space doesn't match the default for the encoding mode used for the current ABI). I used BuildPairF64Pseudo on loading from the address and loading from adding 4 to the address in the TableGen patterns, and added new (Cheri)SplitStoreF64Pseudo's for the store side that do SplitF64Pseudo plus storing to additional addresses. Presumably some of that could be reused if adopting that approach rather than legalisation, though if there's a nicer way without going to legalisation I'd be interested to see it as it's rather ugly (and the only user I can see of (KILL ...) in TableGen patterns themselves...). https://github.com/CTSRD-CHERI/llvm-project/pull/635/files, for what it's worth.

sunshaoce marked 5 inline comments as done.

Replace ld with lw in RV32 Zdinx

Assembler and disassembler support was already commited in https://reviews.llvm.org/D93298

Then why does this contain what look to be MC related changes to .td files? Looking closer, I do see that something has already landed, but there's definitely MC pieces in this. Confusing to say the least.

Can you point to a specific example?

I think I was seeing the FLD_IN32X definition from the Aug 28th version of the patch. This has since been removed, and even at the time of my comment, I must have been looking at a stale diff. Sorry for the noise.

Two minor optional style comments.

llvm/lib/Target/RISCV/RISCVISelLowering.cpp
11497

This pattern repeats a lot. Maybe worth introducing a "Subtarget.hasStdExtZfhorZhinx()" cover?

Also for corresponding F/D variants.

llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td
474 ↗(On Diff #459306)

For the rounding modes, defining symbolic constant names would make the code easier to read. I see that this pattern exists in the current code, so this is a non-blocking suggestion.

sunshaoce marked an inline comment as done.

Address @reames's comment

jrtc27 added inline comments.Sep 28 2022, 8:50 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
53 ↗(On Diff #463574)

Way too generic a name that also isn't correctly capitalised

93 ↗(On Diff #463574)

Is that guaranteed to be 0?

102 ↗(On Diff #463574)

Think about whether this is actually safe, please...

sunshaoce updated this revision to Diff 464557.Oct 2 2022, 9:21 AM
sunshaoce marked an inline comment as done.

Address comment

jrtc27 added inline comments.Oct 2 2022, 9:36 AM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
53–56 ↗(On Diff #464557)

These still are not consistently named

sunshaoce updated this revision to Diff 466297.Oct 8 2022, 9:40 AM
sunshaoce marked an inline comment as done.

Rebase

sunshaoce marked 2 inline comments as done.

Address @jrtc27's comments

jrtc27 added inline comments.Oct 21 2022, 12:21 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
102 ↗(On Diff #463574)

I don't know why you've marked this as done; the line in question (which is now the line below) is not correct, the immediate could be 0x7f[cdef] at which point the arithmetic here will wrap to 0x80[0123] (which the machine verifier probably catches as out of range due to the immediate being signed?..). That is:

void
foo(void *p, double d)
{
    *(double *)((char *)p + 0x7fc) = d;
}

will be miscompiled, surely.

jrtc27 added inline comments.Oct 21 2022, 12:22 PM
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp
102 ↗(On Diff #463574)

Note this doesn't have to mean the store is to an unaligned location, either, p could be unaligned and then the addition of 0x7fc results in a well-aligned location. Though we want to support unaligned accesses anyway so that shouldn't be relevant.

We plan to split this patch into the following four parts:

  • zfinx
  • zdinx on rv32
  • zdinx on rv64
  • zhinx, zhinxmin

And, we will address Jessica’s comments in the new patch. Thanks!

asb added a comment.EditedApr 21 2023, 8:18 AM

We plan to split this patch into the following four parts:

  • zfinx
  • zdinx on rv32
  • zdinx on rv64
  • zhinx, zhinxmin

And, we will address Jessica’s comments in the new patch. Thanks!

Thank you - I think this is a piece of work where you need to have implemented everything to be sure that your approach works, but splitting to allow reviewing it in smaller chunks will be much better.

We plan to split this patch into the following four parts:

  • zfinx
  • zdinx on rv32
  • zdinx on rv64
  • zhinx, zhinxmin

And, we will address Jessica’s comments in the new patch. Thanks!

Thank you - I think this is a piece of work where you need to have implemented everything to be sure that your approach works, but splitting to allow reviewing it in smaller chunks will be much better.

Thanks, we will make sure to do it well.

sunshaoce abandoned this revision.May 24 2023, 11:15 PM
evandro removed a subscriber: evandro.May 25 2023, 1:42 PM