With the benefit of D88392, instruction encoding and mnemonic testing can be
achieved within MIR files before AsmParser is ready. This patch adds such
tests which cover all basic integer instructions we defined in previous patch.
Similarly those tests will be rewrote by .s and moved to test/MC/LoongArch.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
These encoding format names are atrocious to look at and use, especially the "non-standard" ones not mentioned in the manuals yet still have to be named. The names mostly start with a number so cannot be used directly as identifiers, and the irregular formats are named after instructions themselves -- what if some instructions are added later that happen to share the same encoding, like renaming the format is okay?
For fixing these problems once and for good, please consider adopting the unambiguous notation defined by the loongarch-opcodes project. While you may or may not like the idea of renaming a few instruction mnemonics, the instruction format naming is fixed for you, and I really hope the official manuals could follow suit and just ditch the xRIxx names.
Also please separate the changes adding --little-endian into a separate patch, so we only tackle one concern in one commit.
Just want to double check: AsmParser is not implemented yet, is that correct?
Also, can you rename the test filename to be more expressive and straightforward.
llvm/utils/extract-section.py | ||
---|---|---|
62 ↗ | (On Diff #395256) | nit: can we use --endian=<little/big> instead with default value being "big"? |
@xen0n Thanks for your suggestion. We have read the notation defined in the repo you referenced. Yes, one name can exactly specifies one encoding. But there are a couple of issues left to adopt it in llvm. For example:
- immediate operand
Some immediate operands usually have specail meanings like the msbd and lsbd of the bstrpick.d instruction. This instruction will be named with DJUk6Um6 under the natation you proposed and thus lost the meaning of msbd and lsbd because we have to use names like imm1 and imm2 in instruction definition.
In current impl:
// FmtBSTR_D // <opcode | msbd | lsbd | rj | rd> class FmtBSTR_D<bits<10> op, dag outs, dag ins, string asmstr, list<dag> pattern = []> : LAInst<outs, ins, asmstr, pattern> { bits<6> msbd; bits<6> lsbd; bits<5> rj; bits<5> rd; let Inst{31-22} = op; let Inst{21-16} = msbd; let Inst{15-10} = lsbd; let Inst{9-5} = rj; let Inst{4-0} = rd; } ... class ALU_BSTRD<bits<10> op, string opstr, Operand ImmOpnd> : FmtBSTR_D<op, (outs GPR:$rd), (ins GPR:$rj, ImmOpnd:$msb, ImmOpnd:$lsb), !strconcat(opstr, "\t$rd, $rj, $msb, $lsb")>; ... def BSTRPICK_D : ALU_BSTRD<0b0000000011, "bstrpick.d", uimm6>;
Wth your proposal:
// DJUk6Um6 // <opcode | imm1 | imm2 | rj | rd> class DJUk6Um6<bits<10> op, dag outs, dag ins, string asmstr, list<dag> pattern = []> : LAInst<outs, ins, asmstr, pattern> { bits<6> imm1; bits<6> imm2; bits<5> rj; bits<5> rd; let Inst{31-22} = op; let Inst{21-16} = imm1; let Inst{15-10} = imm2; let Inst{9-5} = rj; let Inst{4-0} = rd; } ... def BSTRPICK_D : DJUk6Um6<0b0000000011, (outs GPR:$rd), (ins GPR:$rj, uimm6:$imm1, uimm6:$imm2), !strconcat(opstr, "\t$rd, $rj, $imm1, $imm2")>;
- Understanding cost
Inoder to understand what the name like DJUk6Um6 means people may have to spend more time than the current impl.
But we do not totally disagree this proposal and let's wait more reviewers' comments about that. Thanks.
Yes. AsmParser is not implemented yet which is in our next plan. Until that these MIR tests would be removed and rewrote with .s MC tests and put to llvm/tests/MC/LoongArch like the M68K target does.
About the tes filename, we'll update them until the comments from @xen0n (about the instruction format names) are addressed.
Hi,
Thanks for your detailed reply; although I'd like to clarify a bit. Instruction formats are for describing fixed and variable bit patterns as recognized by the decoder hardware; they are very low-level details that should not be concerned with higher-level semantics (instead, opcodes and mnemonics do).
Take the bstr* family for example: there is nothing stopping you from creating completely unrelated insns also taking two registers and two 6-bit immediates, and with LoongArch encoding conventions (slots placed from LSB to MSB mostly sequentially) this would almost definitely just be DJUk6Um6. With your current approach, the new insns would be of BSTR_D format, yet they don't belong to bstr* family at all.
In addition to that, there's no reason to only "give meaning" to special formats alone; why not rename "2R" to "BINARY_ARITH", "2RI12" to "MEM" and "I26" to "LONG_BRANCH"? Again this is just exposing inconsistencies and double-standards in the current approach; some semantics just have to be looked up in the manuals and I believe this is acceptable for compiler maintainers.
Plus, with the current naming scheme, names for all "regular" formats begin with a number, thus unusable as identifiers in most programming languages. This forces people to come up with ad-hoc names every time they have to implement LoongArch machine code in a project. Indeed, from what I see, the LoongArch port of binutils, llvm and qemu all use different names from each other, presumably because the Loongson people behind each port all come from different (sub)teams; this is not good for maintenance in the long run.
Different compiler communities use different names. This is the reality not only for LoongArch but also other targets.
Yes, I agree with you that binutils, llvm, qemu using same names is good for maintenace although this has not been achieved by other targets (like RISCV).
So, are you happy with the names defined for LoongArch in binutils?
In binutils-gdb/opcodes/loongarch-opc.c
{ 0x00c00000, 0xffc00000, "bstrpick.d", "r0:5,r5:5,u16:6,u10:6", 0, 0, 0, 0 },
We may consider using R0_5_R5_5_U16_6_U10_6 as the instruction format name for the bstrpick.d instruction.
Yeah maybe others didn't do this, but we're a new target, so we might as well do better. In the end we could even figure out the best practice for others to possibly follow, so I still think such an approach is feasible.
Also note that RISC-V actually has the centralized riscv-opcodes project for instruction encodings; and their insn format names are sane, unlike LoongArch, so in fact the RISC-V ports from different projects still can standardize naming.
So, are you happy with the names defined for LoongArch in binutils?
In binutils-gdb/opcodes/loongarch-opc.c
{ 0x00c00000, 0xffc00000, "bstrpick.d", "r0:5,r5:5,u16:6,u10:6", 0, 0, 0, 0 },We may consider using R0_5_R5_5_U16_6_U10_6 as the instruction format name for the bstrpick.d instruction.
This is even worse than ad-hoc BSTR_D names, because the name becomes too long and unwieldy to pronounce and use. (With names like DJUk6Um6 or FdFjFkCa you can at least spell every letter out individually.) And a lot of details are redundant, like the "r" prefix implying the slot is 5-bit wide, and so on.
My comment had been addressed and I'm satisfied with this patch too. @xen0n do you have any further comments?
Well, as the patch splitting is already done, and it's already decided in the 4th patch's discussion thread that the instruction format renaming is not happening at this time, I have no further comment regarding this; and I think this patch (and the initial series) can be accepted. More refactoring could always come later.
Let the fun begin!
(EDIT: it was the 4th patch's thread, not the 5th.)