Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Unit Tests
Event Timeline
Note: I get a crash when I try to compile some LLVM IR. Crash trace:
$ bin/llc --march riscv32 -mattr +zfinx test.ll LLVM ERROR: Cannot select: t8: ch = store<(store (s32) into %ir.4)> t7, t4, t6, undef:i32 t4: f32,ch = load<(dereferenceable load (s32) from %ir.2)> t0, FrameIndex:i32<1>, undef:i32 t1: i32 = FrameIndex<1> t3: i32 = undef t6: i32,ch = load<(dereferenceable load (s32) from %ir.1, align 8)> t0, FrameIndex:i32<0>, undef:i32 t5: i32 = FrameIndex<0> t3: i32 = undef t3: i32 = undef In function: _Z10sum_floatsPfjS_ PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: bin/llc --march riscv32 -mattr +zfinx test.ll 1. Running pass 'Function Pass Manager' on module 'test.ll'. 2. Running pass 'RISCV DAG->DAG Pattern Instruction Selection' on function '@_Z10sum_floatsPfjS_' Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it): 0 llc 0x000000010974962d llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) + 61 1 llc 0x0000000109749bdb PrintStackTraceSignalHandler(void*) + 27 2 llc 0x000000010974785b llvm::sys::RunSignalHandlers() + 139 3 llc 0x000000010974b7d8 SignalHandler(int) + 232 4 libsystem_platform.dylib 0x00007fff206e5d7d _sigtramp + 29 5 libsystem_platform.dylib 000000000000000000 _sigtramp + 18446603339972059808 6 libsystem_c.dylib 0x00007fff205f5406 abort + 125 7 llc 0x00000001095aae4a llvm::report_fatal_error(llvm::Twine const&, bool) + 394 8 llc 0x0000000109432305 llvm::SelectionDAGISel::CannotYetSelect(llvm::SDNode*) + 725 9 llc 0x000000010942dd72 llvm::SelectionDAGISel::SelectCodeCommon(llvm::SDNode*, unsigned char const*, unsigned int) + 26690 10 llc 0x0000000106faec6c llvm::RISCVDAGToDAGISel::SelectCode(llvm::SDNode*) + 44 11 llc 0x0000000106fade73 llvm::RISCVDAGToDAGISel::Select(llvm::SDNode*) + 31299 12 llc 0x000000010941e7fd llvm::SelectionDAGISel::DoInstructionSelection() + 1629 13 llc 0x000000010941cf4d llvm::SelectionDAGISel::CodeGenAndEmitDAG() + 6205 14 llc 0x000000010941b56f llvm::SelectionDAGISel::SelectBasicBlock(llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, true>, llvm::ilist_iterator<llvm::ilist_detail::node_options<llvm::Instruction, true, false, void>, false, true>, bool&) + 399 15 llc 0x0000000109419649 llvm::SelectionDAGISel::SelectAllBasicBlocks(llvm::Function const&) + 6329 16 llc 0x0000000109415d2c llvm::SelectionDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 2268 17 llc 0x0000000106fb0668 llvm::RISCVDAGToDAGISel::runOnMachineFunction(llvm::MachineFunction&) + 56 18 llc 0x0000000107cc0dcd llvm::MachineFunctionPass::runOnFunction(llvm::Function&) + 541 19 llc 0x0000000108645dbc llvm::FPPassManager::runOnFunction(llvm::Function&) + 700 20 llc 0x000000010864dce5 llvm::FPPassManager::runOnModule(llvm::Module&) + 117 21 llc 0x0000000108646784 (anonymous namespace)::MPPassManager::runOnModule(llvm::Module&) + 772 22 llc 0x00000001086462a8 llvm::legacy::PassManagerImpl::run(llvm::Module&) + 296 23 llc 0x000000010864e081 llvm::legacy::PassManager::run(llvm::Module&) + 33 24 llc 0x0000000106eeba16 compileModule(char**, llvm::LLVMContext&) + 6742 25 llc 0x0000000106ee977d main + 1501 26 libdyld.dylib 0x00007fff206bbf3d start + 1 27 libdyld.dylib 0x0000000000000006 start + 18446603339972231370 Abort trap: 6
Corresponding LLVM IR:
define dso_local void @_Z10sum_floatsPfjS_() #0 { %1 = alloca float*, align 8 %2 = alloca float, align 4 %3 = load float, float* %2, align 4 %4 = load float*, float** %1, align 8 store float %3, float* %4, align 4 ret void }
(Note that it compiles just fine with +f (but using float registers...), or with nothing (but uses softwawre emulation for the fadd)
(Edited to shorten llvm ir code a lot :) )
Ok, I've since learned that this PR doesnt handle load/store. Only arithmetic.
However note that immediate float arithmetic also seems to be missing, e.g. the following crashes
define noundef float @_Z4funcf(float noundef %0) #0 { %2 = fadd float %0, 3.000000e+00 ret float %2 }
Compiled with:
bin/llc --march riscv32 -mattr +zfinx test_fadd_imm.ll
Gives:
LLVM ERROR: Cannot select: t15: f32,ch = load<(load (s32) from constant-pool)> t0, t19, undef:i32 t19: i32 = ADDI t18, TargetConstantPool:i32<float 3.000000e+00> 0 [TF=3] t18: i32 = LUI TargetConstantPool:i32<float 3.000000e+00> 0 [TF=4] t16: i32 = TargetConstantPool<float 3.000000e+00> 0 [TF=4] t17: i32 = TargetConstantPool<float 3.000000e+00> 0 [TF=3] t14: i32 = undef In function: _Z4funcf
(but compiles ok with +f)
llvm/lib/Target/RISCV/RISCVInstrInfoF.td | ||
---|---|---|
544 | I think it's confusing that FINX in this context means F and F_INX. I feel it might be good to make more explicit that FINX includes F. How to do that is an open question. I'd suggest FFINX, but FFINX is already taken to mean FF with FF_INX. Maybe we need an explicit acronym to mean 'F and F_INX', like e.g. 'ALLF'. Or perhaps use a verbose form like 'F_AND_FINX'? (If it was my own decision I'd go for the explicit F_AND_FINX) |
Also, not really part of this PR, but I don't know where else to put this. This code:
def FF : ExtInfo_rr<FExt, FPR32, FPR32>; def FF_INX : ExtInfo_rr<ZfinxExt, FPR32INX, FPR32INX>; def FX : ExtInfo_rr<FExt, FPR32, GPR>; def FX_INX : ExtInfo_rr<ZfinxExt, FPR32INX, GPR>; def FX_64 : ExtInfo_rr<F64Ext, FPR32, GPR>; def FX_INX_64 : ExtInfo_rr<Zfinx64Ext, FPR32INX, GPR>; def XF : ExtInfo_rr<FExt, GPR, FPR32>; def XF_64 : ExtInfo_rr<F64Ext, GPR, FPR32>; def XF_INX : ExtInfo_rr<ZfinxExt, GPR, FPR32INX>; def XF_INX_64 : ExtInfo_rr<Zfinx64Ext, GPR, FPR32INX>;
... I feel it would be good to add a docstring explaining what this is, like e.g. something like:
Each of these definitions represents a single combination of register spaces. For example, `FX` means that the destination register is a float register (`FPR32`), e.g. `ft1`, and that the source register is a general `x` register (`GPR`), e.g. `x1`. The `_INX` versions are for the `+zfinx` extension. In this case, `FPR32INX` is effectively a synonym in practice for `GPR`. At least `FPR32INX` are emitted as general `x` registers, even though they 'look like' float registers for much of the tablegen process.
Thanks, I'll try to fix it.
llvm/lib/Target/RISCV/RISCVInstrInfoF.td | ||
---|---|---|
544 | Your suggestion is very good, but I think FINX should be replaced with a shorter name because it appears as a parameter, eg F_INX. And consider changing the name of F_INX to FprINX. Would you like to submit a new patch about this? |
As a general comment, after thinking about this whilst brushing my teeth, I kind of feel that it might be clearer and simpler for everyone, both F people and Zfinx people, if we decoupled F and Zfinx into two separate files. This way F people don't have to figure out what is Zfinx. They just keep their clean separate F file. and Zfinx people dont have to think about how our bits mangle with F bits. Yes involves a bit of duplication, but if it's clearer/cleaner for everyone, maybe it might be a good approach?
After all, Zfinx is not an extension of F. It's a separate, mutually incompatible extension. And it's a fairly niche extension, I feel.
(Conversely, if we feel that zfinx is a non-niche extension, then it should have a shorter name I feel, eg maybe U for untyped registers)
Hi Hugh, and welcome! The naming of the extension is outside of our control I'm afraid.
I see your point, though the main reasoning about keeping F and Zfinx together is that they are largely the same instructions, and the hope is to define instructions and patterns together where feasible.
Yeah, I totally understand that. And when I first looked at this PR, I thought that made total sense. But then I saw how challenging it is to try to combine the two.
As far as the general principle of 'lets keep things together with similar patterns', the same argument can be levied against the pair of F and D, and yet they are in separate files, and I currently feel putting F and D in separate files is likely a reasonable idea. Despite their apparent similarities (32 vs 64 bits...), it is simpler and easier to keep D and F separate. Compared to D vs F, F vs Zfinx are I feel more different, and there is more reason to keep them apart. But even if they were the same difference as D and F, I feel keeping Zfinx and F apart coudl be a good idea:
- Zfinx is fairly niche, and seems unfair to 'pollute' F with it?
- I suspect Zfinx might be easier to implement if it is kept separaet fomr F.
Anyway, I seem likely to at least dabble in creating a decoupled version of Zfinx. I sort of have a hunch it will be eiaser. If it's not, well, I will come back, and say "well, yes, you were right, it is good to keep Zfinx and F together because ... [reasons X, Y, Z]".
@hughperkins: You make good points. It's not immediately obvious what kind of splitting would be better, and at what point the added complexity of trying to have piggy-back zfinx on current F/D patterns hampers readability and maintainability rather than enhance it. If you have time to play around with some options, I'd be really interested in hearing what you come up with.
By the way, whilst I remember, since @sunshaoce has implemented all the functionality I need (for now), so I'm not intending to spend any time on working on this. Everything I tested in this PR worked great for me :)
A couple of very high level review comments.
- This patch commingles the assembler/disassembly support and the pattern matching required for IR lowering. The typical process for an extension is to start with a patch specific to the MC layer (i.e. assembler/disassembler), and then move to codegen support in a later patch. I strongly strongly recommend you split if you want meaningful chance of review.
- You don't appear to have any assembler tests. That's a show stopper right there.
- On the MC part, please try to get the new extension as separate as possible from the existing floating point ones. I realize there's some potential for code savings, but the change will be lower risk if it touches as little as possible. Once it's in, and tested, changes to share code incrementally would be reasonable.
I strongly recommend you address these points. This patch is unlikely to get meaningful review without them addressed.
Assembler and disassembler support was already commited in https://reviews.llvm.org/D93298
llvm/lib/Target/RISCV/RISCVInstrInfoD.td | ||
---|---|---|
105 ↗ | (On Diff #456908) | Is this something that should have been in the MC layer patch? |
llvm/lib/Target/RISCV/RISCVInstrInfoD.td | ||
---|---|---|
105 ↗ | (On Diff #456908) | No. These don't exist. From the spec: Load-pair and store-pair instructions are not provided, so transferring double-precision operands in RV32Zdinx from or to memory requires two loads or stores. |
Then why does this contain what look to be MC related changes to .td files? Looking closer, I do see that something has already landed, but there's definitely MC pieces in this. Confusing to say the least.
Can you point to a specific example?
llvm/lib/Target/RISCV/RISCVInstrInfoD.td | ||
---|---|---|
105 ↗ | (On Diff #456908) | Are these expanded somewhere that I'm not seeing? |
llvm/test/CodeGen/RISCV/double-mem.ll | ||
---|---|---|
22 ↗ | (On Diff #456908) | This seems to be emitting an ld instruction in rv32 which is illegal. |
llvm/lib/Target/RISCV/RISCVInstrInfoD.td | ||
---|---|---|
105 ↗ | (On Diff #456908) | Nope, you can see them in the test output for the RV32Zdinx tests... it's just wrong. Whether these should be pseudos or we should be teaching the legaliser that f64 is legal except for loads/stores I don't know; the latter would be nice, but the interaction with spills sounds bad. I had a similar issue downstream in CHERI-LLVM where, depending on the mode and pointer address space, there is no FLD/FSD available (when the pointer address space doesn't match the default for the encoding mode used for the current ABI). I used BuildPairF64Pseudo on loading from the address and loading from adding 4 to the address in the TableGen patterns, and added new (Cheri)SplitStoreF64Pseudo's for the store side that do SplitF64Pseudo plus storing to additional addresses. Presumably some of that could be reused if adopting that approach rather than legalisation, though if there's a nicer way without going to legalisation I'd be interested to see it as it's rather ugly (and the only user I can see of (KILL ...) in TableGen patterns themselves...). https://github.com/CTSRD-CHERI/llvm-project/pull/635/files, for what it's worth. |
I think I was seeing the FLD_IN32X definition from the Aug 28th version of the patch. This has since been removed, and even at the time of my comment, I must have been looking at a stale diff. Sorry for the noise.
Two minor optional style comments.
llvm/lib/Target/RISCV/RISCVISelLowering.cpp | ||
---|---|---|
11721 | This pattern repeats a lot. Maybe worth introducing a "Subtarget.hasStdExtZfhorZhinx()" cover? Also for corresponding F/D variants. | |
llvm/lib/Target/RISCV/RISCVInstrInfoZfh.td | ||
474 ↗ | (On Diff #459306) | For the rounding modes, defining symbolic constant names would make the code easier to read. I see that this pattern exists in the current code, so this is a non-blocking suggestion. |
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp | ||
---|---|---|
53–56 ↗ | (On Diff #464557) | These still are not consistently named |
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp | ||
---|---|---|
102 ↗ | (On Diff #463574) | I don't know why you've marked this as done; the line in question (which is now the line below) is not correct, the immediate could be 0x7f[cdef] at which point the arithmetic here will wrap to 0x80[0123] (which the machine verifier probably catches as out of range due to the immediate being signed?..). That is: void foo(void *p, double d) { *(double *)((char *)p + 0x7fc) = d; } will be miscompiled, surely. |
llvm/lib/Target/RISCV/RISCVExpandPseudoInsts.cpp | ||
---|---|---|
102 ↗ | (On Diff #463574) | Note this doesn't have to mean the store is to an unaligned location, either, p could be unaligned and then the addition of 0x7fc results in a well-aligned location. Though we want to support unaligned accesses anyway so that shouldn't be relevant. |
We plan to split this patch into the following four parts:
- zfinx
- zdinx on rv32
- zdinx on rv64
- zhinx, zhinxmin
And, we will address Jessica’s comments in the new patch. Thanks!
Thank you - I think this is a piece of work where you need to have implemented everything to be sure that your approach works, but splitting to allow reviewing it in smaller chunks will be much better.
This pattern repeats a lot. Maybe worth introducing a "Subtarget.hasStdExtZfhorZhinx()" cover?
Also for corresponding F/D variants.