This is an archive of the discontinued LLVM Phabricator instance.

[LoongArch] Add basic support to AsmParser
ClosedPublic

Authored by SixWeining on Feb 24 2022, 5:34 AM.

Details

Summary

This patch adds basic support to AsmParser which can handle basic
instructions with register or immediate operands. With the addition of
the parser, now it's possible to test instructions encoding with llvm-mc.

Depends on D120545

Disassembler will be added later and then we can do round-trip test.

Diff Detail

Event Timeline

SixWeining created this revision.Feb 24 2022, 5:34 AM
SixWeining requested review of this revision.Feb 24 2022, 5:34 AM
Herald added a project: Restricted Project. · View Herald TranscriptFeb 24 2022, 5:34 AM
myhsu added inline comments.Feb 24 2022, 10:59 AM
llvm/lib/Target/LoongArch/AsmParser/CMakeLists.txt
9

nit: lexical ordering

llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
396

can we use Twine here instead?

439

I'm not sure but do you think this will improve readability?

llvm/lib/Target/LoongArch/CMakeLists.txt
8

why did the diff show disassembler changes here?

llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchMCCodeEmitter.cpp
81 ↗(On Diff #411096)

Why did you include code emitter changes here? can you split these (and the EncoderMethod changes in the tablegen files) into another patch?

Address @myhsu comments.

Thanks for your review @myhsu. I have updated this revision to address your comments.

llvm/lib/Target/LoongArch/AsmParser/CMakeLists.txt
9

No problem. Thanks.

llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
396

I'm afraid not because Twine does not override the += operator which is used in the following for loop.

439

Thanks. I think it make sense.

llvm/lib/Target/LoongArch/CMakeLists.txt
8

Oh, sorry. I forget to move it to D120477. Thanks!

llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchMCCodeEmitter.cpp
81 ↗(On Diff #411096)

These 2 methods are used by assembler to encode instruction's unusual immediate operand. The word unusual means the encoding of an immediate in instruction is not the same as its binary representation. For example, the bl instruction requires a signed 28-bits integer as its operand and the low 2-bits must be 0. So assembler only need to encode the uppper 26-bits into instruction's encoding.

So I think these changes could not be moved otherwise the MC test will fail.

myhsu added inline comments.Feb 24 2022, 8:40 PM
llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchMCCodeEmitter.cpp
81 ↗(On Diff #411096)

but without these two instructions to encode unusual, why didn't the previous version of tests (.mir files) also fail?

I have splited the EncoderMethod changes to D120545. Thanks!

llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchMCCodeEmitter.cpp
81 ↗(On Diff #411096)

Good question! I haven't realized this problem.
The previous .mir test uses incorrect MIR, for example:

llvm/test/CodeGen/LoongArch/misc.mir

# -------------------------------------------------------------------------------------------------
#                                           Encoding format: I26
# -------------------------------------------------------------------------------------------------
# ------------------+-----------------------------------------------+------------------------------
#  31 30 29 28 27 26 25 24 23 22 21 20 19 18 17 16 15 14 13 12 11 10 09 08 07 06 05 04 03 02 01 00
# ------------------+-----------------------------------------------+------------------------------
#     opcode        |                   imm26{15-0}                 |       imm26{25-16}
# ------------------+-----------------------------------------------+------------------------------

---
# CHECK-LABEL: test_BL:
# CHECK-ENC: 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
# CHECK-ASM: bl 34
name: test_BL
body: |
  bb.0:
    BL 34
...

Here, 34 is an illegal immediate operand for the BL instruction. The reason why it doesn't fails is getImmOpValueAsr2 is not added at that time.

So, I think the right way is just as you said "split these (and the EncoderMethod changes in the tablegen files) into another patch" and fix the .mir test.

Thanks again!

SixWeining updated this revision to Diff 411351.EditedFeb 25 2022, 1:47 AM

Split the EncoderMethod changes to D120545 and keep the previous .mir test.

myhsu added a comment.Feb 25 2022, 3:05 PM

The previous .mir test uses incorrect MIR
Here, 34 is an illegal immediate operand for the BL instruction. The reason why it doesn't fails is getImmOpValueAsr2 is not added at that time.

That makes more sense. Just make sure you really fix it in this patch.
Thanks for the clarifications.

myhsu accepted this revision.Feb 25 2022, 3:14 PM

LGTM

This revision is now accepted and ready to land.Feb 25 2022, 3:14 PM
xen0n added inline comments.Feb 27 2022, 7:44 AM
llvm/test/MC/LoongArch/Directives/cfi-valid.s
9 ↗(On Diff #411351)

Maybe add checks for other registers as well?

MaskRay requested changes to this revision.Feb 27 2022, 11:10 PM
MaskRay added inline comments.
llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
229

delete assert

dyn_cast asserts non-null

271

delete return

279
345

The code self explains. Delete comment

348

The code self explains. Delete comment

llvm/test/MC/LoongArch/ISA/Basic/Integer/invalid.s
8 ↗(On Diff #411351)

[[@LINE]] is deprecated FileCheck pattern. Use [[#@LINE]]

This revision now requires changes to proceed.Feb 27 2022, 11:10 PM
MaskRay added inline comments.Feb 27 2022, 11:20 PM
llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
148

These functions are unused. Usually it's better to add them in the patches that need them.

If the function is so simple, perhaps just use isUImm<3>. It just requires 2 more characters.

312

If operand was parsed, returns false, else ... => Return true upon an error.

317

The code self explains.

if (parseRegister(Operands) == MatchOperand_Success || parseImmediate(Operands) == MatchOperand_Success)
  return false;
337

This should be consumed. Not consuming it may cause some diagnostic line:column glitch, but I haven't tested.

parseOptionalToken can consume the token.

SixWeining marked 4 inline comments as done.

Address @MaskRay and @xen0n's comments. Thanks!

llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
148

These functions are used by tablegen-ed code in LoongArchGenAsmMatcher.inc.

229

OK. Thanks!

312

OK.

317

OK. Thanks.

337

Yes. Sorry I missed. Thanks.

llvm/test/MC/LoongArch/Directives/cfi-valid.s
9 ↗(On Diff #411351)

OK. Thanks, let me add all others.

xen0n added inline comments.Feb 28 2022, 8:43 AM
llvm/test/MC/LoongArch/ISA/Basic/Integer/invalid.s
171–172 ↗(On Diff #411795)
# Instructions outside the base ISA
# TODO: Test instructions in LSX/LASX/LBT/LVZ after their introduction.
MaskRay added inline comments.Feb 28 2022, 11:57 AM
llvm/test/MC/LoongArch/Directives/cfi-invalid.s
8 ↗(On Diff #411795)

Should test line/column
:[[#@LINE+1]]:...: error:

llvm/test/MC/LoongArch/Directives/cfi-valid.s
49 ↗(On Diff #411795)

Unsure it is useful enumerating all values. Having the minimum, the maximum, and a random middle value is sufficient.

llvm/test/MC/LoongArch/Directives/data-valid.s
1 ↗(On Diff #411795)

It's some folks's rule and used heavily in binary utilities: prefer ## for non-RUN-non-CHECK comments so that they stand out (many editors highlight them), and in case you use some generator to generate the tests, the generator can be taught to retain ## comments while scrubbing other comments.

3 ↗(On Diff #411795)

I find it sometimes more readable by combing non-error and error cases in one file. See --defsym=ERR=1 in test/MC.
I usually place error cases after non-error cases.

13 ↗(On Diff #411795)

I'd write:

# CHECK:      .byte 1
# CHECK-NEXT: .byte 255
.byte 1
.byte 0xff

# CHECK:      .half 1
# CHECK-NEXT: .half 65535
.half 0x1
.half 0xffff

...

It's important to use CHECK-NEXT more than CHECK to detect that the end-of-line is correctly handled.

37 ↗(On Diff #411795)

Many values just duplicate and unneeded.

llvm/test/MC/LoongArch/ISA/Basic/Integer/arith.s
4 ↗(On Diff #411795)

The prevailing style is to use double-dash options for FileCheck. --check-prefix is more common when there is just one prefix.

SixWeining updated this revision to Diff 412006.Mar 1 2022, 1:25 AM
SixWeining marked 3 inline comments as done.

Address @MaskRay and @xen0n's comments.

llvm/test/MC/LoongArch/Directives/cfi-invalid.s
8 ↗(On Diff #411795)

Sorry I missed that. Thanks.

llvm/test/MC/LoongArch/Directives/data-valid.s
1 ↗(On Diff #411795)

Thanks. I will follow this implicit standard.

13 ↗(On Diff #411795)

OK. Thanks.

llvm/test/MC/LoongArch/ISA/Basic/Integer/arith.s
4 ↗(On Diff #411795)

Yes. And also for --triple=. Thanks.

llvm/test/MC/LoongArch/ISA/Basic/Integer/invalid.s
171–172 ↗(On Diff #411795)

OK. Thanks.

The file hierarchy llvm/test/MC/LoongArch/ISA/Basic/Integer/arith.s looks too long to me.

Do ISA and Basic in the path components improve clarity? If not, consider llvm/test/MC/LoongArch/Integer/arith.s
Note that for test/MC/$target/, we expect that most tests are ISA related...

llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
337

You can use parseOptionalToken :)

The comment hasn't been marked done.

339

ditto, parseOptionalToken

345

ditto. switch then and else, and use parseOptionalToken

431

Do you have negative tests for the invalid immediate Upper+1? There should be one for each immediate form.

llvm/test/MC/LoongArch/Directives/data.s
2

Nit:

# RUN: llvm-mc --triple=loongarch32 %s | FileCheck %s
# RUN: llvm-mc --triple=loongarch64 %s | FileCheck %s
# RUN: not llvm-mc --triple=loongarch32 --defsym=ERR=1 %s 2>&1 \
# RUN:     | FileCheck %s --check-prefix=CHECK-ERR
# RUN: not llvm-mc --triple=loongarch64 --defsym=ERR=1 %s 2>&1 \
# RUN:     | FileCheck %s --check-prefix=CHECK-ERR

positive tests ...

.ifdef ERR
negative tests ...
.endif
llvm/test/MC/LoongArch/ISA/Basic/Integer/misc.s
6 ↗(On Diff #412006)

CHECK32-ASM may be confusing when it actually apples to both 32-bit and 64-bit.

Perhaps CHECK-ASM

SixWeining marked 6 inline comments as done.Mar 1 2022, 6:44 PM

The file hierarchy llvm/test/MC/LoongArch/ISA/Basic/Integer/arith.s looks too long to me.

Do ISA and Basic in the path components improve clarity? If not, consider llvm/test/MC/LoongArch/Integer/arith.s
Note that for test/MC/$target/, we expect that most tests are ISA related...

About the hierarchy, since relocations and LASX/LSX/LVZ would be introduced in future and LASX/LSX apply both to Integer and FloatingPoint, so my plan is

LoongArch/
├── Directives
├── ISA
│   ├── Basic
│   │   ├── FloatingPoint
│   │   ├── Integer
│   │   ├── Privileged
│   ├── LASX
│   └── LSX
│   └── LVZ
└── Misc
└── Relocations

How about this:

LoongArch/
├── Basic
│   ├── FloatingPoint
│   ├── Integer
│   └── Privileged
├── Directives
├── LASX
├── LSX
├── LVZ
├── Misc
└── Relocations
llvm/lib/Target/LoongArch/AsmParser/LoongArchAsmParser.cpp
431

Yes, I have. They are in test/MC/LoongArch/ISA/Basic/Integer/invalid.s.

llvm/test/MC/LoongArch/ISA/Basic/Integer/misc.s
6 ↗(On Diff #412006)

Thanks. Should be CHECK-ASM.

Herald added a project: Restricted Project. · View Herald TranscriptMar 1 2022, 6:44 PM
SixWeining updated this revision to Diff 412305.Mar 1 2022, 7:00 PM
SixWeining marked an inline comment as done.

Continue to address @MaskRay's new comments.

MaskRay accepted this revision.Mar 2 2022, 10:29 AM

LGTM. I don't know the ISA. Would be great to hear from @xen0n

This revision is now accepted and ready to land.Mar 2 2022, 10:29 AM
xen0n added a comment.Mar 2 2022, 6:11 PM

Looks good ISA implementation-wise; I haven't independently reviewed the instruction encodings, but the immediate operand handling is consistent with manual syntax and binutils behavior.

llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
65

I may have missed this in the previous patches, but why does only this kind of imm get the ImmLeaf treatment? I briefly looked at the other targets and they seem to mark most of their commonly used immediate kinds as such.

llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchMCTargetDesc.cpp
109–111

Considering that every line in this function is self-documenting, do we really need any of them? At least with the addition below, *this* line becomes inaccurate...

SixWeining added inline comments.Mar 3 2022, 9:23 PM
llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
65

The ImmLeaf is used in instruction selection stage to match the immediate. Yes, all immediate kinds that could be used in instruction selection should be defined with ImmLeaf. They will added when we implement the codegen.

The reason why only simm12 is defined with ImmLeaf is that in line 491 we defined a few instruction selection Pats:

...
def : PatGprImm<add, ADDI_W, simm12>;
...

These Pats here are just to demonstrate layout of this .td file and the way to define Pat. Of course, more Pats will be defined later to support codegen.

Sorry for the confusion. I hope I make it clear now.

llvm/lib/Target/LoongArch/MCTargetDesc/LoongArchMCTargetDesc.cpp
109–111

I agree with you. Let me delete them in a seperate patch. Thanks.

SixWeining updated this revision to Diff 413170.Mar 4 2022, 5:03 PM

data.s: check data encodings via llvm-objdump

xen0n accepted this revision.Mar 8 2022, 10:38 PM

Although I only randomly verified one instruction's encoding (by hand and cross-checking with binutils), as the testcases seem to be auto-generated, I feel the rest should be okay as well (or you probably would not get very far in real world usage).

llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
65

Okay, understood. It's generally better to only add things when absolutely needed though, this way you don't have to explain every time that these are "to be used later".

SixWeining added inline comments.Mar 8 2022, 11:53 PM
llvm/lib/Target/LoongArch/LoongArchInstrInfo.td
65

Yes, thanks.

This revision was landed with ongoing or failed builds.Mar 9 2022, 12:26 AM
This revision was automatically updated to reflect the committed changes.