This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Add support for the .insn directive
ClosedPublic

Authored by zhanjunl on Jun 28 2016, 11:46 AM.

Details

Summary

Add support for the .insn directive.

.insn is an s390 specific directive that allows encoding of an instruction
instead of using a mnemonic. The motivating case is some code in node.js that
requires support for the .insn directive.

Diff Detail

Repository
rL LLVM

Event Timeline

zhanjunl retitled this revision from to [SystemZ] Add support for .insn and .word/.short/.long/.quad directives.
zhanjunl updated this object.
zhanjunl added a reviewer: uweigand.
zhanjunl added a subscriber: llvm-commits.
koriakin added inline comments.
lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
900 ↗(On Diff #62119)

.short, .long. .quad are already supported by common code.

koriakin added inline comments.Jun 29 2016, 1:04 AM
lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
900 ↗(On Diff #62119)

Also, .word would be better implemented by using addAliasForDirective instead of reimplementing the handler.

koriakin added inline comments.Jun 29 2016, 1:52 AM
lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
545 ↗(On Diff #62119)

should be NumOperands

549 ↗(On Diff #62119)

For better or worse, gas considers the immediate here to be pc-relative and shifted.

551 ↗(On Diff #62119)

Also pc-relative and shifted.

569 ↗(On Diff #62119)

Likewise pc-relative and shifted.

590 ↗(On Diff #62119)

This is wrong.

592 ↗(On Diff #62119)

Likewise.

594 ↗(On Diff #62119)

Likewise.

697 ↗(On Diff #62119)

Looks like a job for llvm_unreachable.

962 ↗(On Diff #62119)

Would be nice to have "too few operands" error here.

972 ↗(On Diff #62119)

gas also accepts plain numbers whenever registers are accepted - while normally it's not much of an issue, it needs to be supported here, as the 'R' fields in many formats are also used for immediate arguments (eg. CC mask) for some instructions.

996 ↗(On Diff #62119)

Needs verification of immediate range (or at least clipping off the extra bits).

koriakin requested changes to this revision.Jun 29 2016, 2:25 AM
koriakin added a reviewer: koriakin.

See inline comments. Also, this might not be the good level to have .insn handling - for one, we forfeit any hope of handling fixups in immediate operands (which is necessary to handle pc-relative forms correctly). It might be better to represent these directives as special machine instructions, one per form.

This revision now requires changes to proceed.Jun 29 2016, 2:25 AM

Oh, and the .word change should be split off (it'll likely be a one-liner using the alias functionality)

koriakin added inline comments.Jun 29 2016, 2:43 AM
test/MC/SystemZ/directive-insn.s
10 ↗(On Diff #62119)

Please cross-check this test with gas, this line (and some others) assembles differently...

uweigand added inline comments.Jun 29 2016, 11:09 AM
test/MC/SystemZ/directive-insn.s
4 ↗(On Diff #62119)

We really should have tests for all the formats, since they encode differently.

I do in fact see instructions already recognized for several of these:
RIE: CRJ*, RxSBG, ...
RRF: CRT*, LOCR(G), PPA, many FP insns
SS: MVC, NC, OC, XC, CLC
RSE: SRLG etc, LMG/STMG, ...

For the others, it might be good to add some instructions in those formats, they can be useful for inline asm anyway.

From my understanding, gas only supports some of the possible s390 instruction formats when using ".insn". For example, the "ri" format in the insn directive corresponds to the RI-a format, which uses an immediate field and not a relative-immediate field. Therefore, for the .insn directive specifically:

ri = RI-a
rie = RIE-d
ril = RIL-a

I tried assembling a sample .insn directive with gas:

        .text
        .globl  main
        .type   main,@function
main:
        lghi    %r2, 0
#       brct %r2,.label
        .insn ri,0xa7060000,%r2,.label
.label:
        br      %r14
.Lfunc_end0:
        .size   main, .Lfunc_end0-main

gas will assemble the above code with no errors, but when I do an objdump of the object file, I see:

temp.o:     file format elf64-s390


Disassembly of section .text:

0000000000000000 <main>:
   0:   a7 29 00 00             lghi    %r2,0
   4:   a7 26 00 00             brct    %r2,4 <main+0x4>

0000000000000008 <.label>:
   8:   07 fe                   br      %r14
   a:   07 07                   nopr    %r7

So it looks like it was able to get the correct instruction mnemonic, but it silently fails when trying to encode the label into the instruction. This was my understanding when supporting the .insn, and so I didn't expect to have to do any fixups. Unless I'm mistaken?

Once I figure out if we're sticking with this implementation, I'll continue working on this and fix up all the comments that have been made. Thanks for the quick and thorough review guys!

test/MC/SystemZ/directive-insn.s
4 ↗(On Diff #62119)

Similar to the comment I added, rie corresponds with RIE-d instructions, of which I couldn't find any that clang supported.

Same with rrf (RRF-a,b), ss (SS-d), rse (I couldn't find an actual format that matched this class, as those instructions are RSY?).

From my understanding, gas only supports some of the possible s390 instruction formats when using ".insn". For example, the "ri" format in the insn directive corresponds to the RI-a format, which uses an immediate field and not a relative-immediate field. Therefore, for the .insn directive specifically:

ri = RI-a
rie = RIE-d
ril = RIL-a

I was refering to RIE, RIL, RSI with my comments about pc-relative operands, not RI. Here goes (gas):

nopr %r7      
.insn rie, 0xec0000000044, 1, 2, label
nopr %r7
label:
Disassembly of section .text:

0000000000000000 <label-0xa>:
   0:   07 07                   nopr    %r7
   2:   ec 12 00 04 00 44       brxhg   %r1,%r2,a <label>
   8:   07 07                   nopr    %r7

000000000000000a <label>:
   a:   07 07                   nopr    %r7

So this is definitely supported.

I tried assembling a sample .insn directive with gas:

        .text
        .globl  main
        .type   main,@function
main:
        lghi    %r2, 0
#       brct %r2,.label
        .insn ri,0xa7060000,%r2,.label
.label:
        br      %r14
.Lfunc_end0:
        .size   main, .Lfunc_end0-main

gas will assemble the above code with no errors, but when I do an objdump of the object file, I see:

temp.o:     file format elf64-s390


Disassembly of section .text:

0000000000000000 <main>:
   0:   a7 29 00 00             lghi    %r2,0
   4:   a7 26 00 00             brct    %r2,4 <main+0x4>

0000000000000008 <.label>:
   8:   07 fe                   br      %r14
   a:   07 07                   nopr    %r7

So it looks like it was able to get the correct instruction mnemonic, but it silently fails when trying to encode the label into the instruction. This was my understanding when supporting the .insn, and so I didn't expect to have to do any fixups. Unless I'm mistaken?

Again, this is RI, which gas considers to be RI-a, which is not PC-relative. Also, please disassemble your code with objdump -dr:

Disassembly of section .text:

0000000000000000 <main>:
   0:   a7 29 00 00             lghi    %r2,0
   4:   a7 26 00 00             brct    %r2,4 <main+0x4>
                        6: R_390_16     .text+0x8

0000000000000008 <.label>:
   8:   07 fe                   br      %r14
   a:   07 07                   nopr    %r7

Here goes the fixup.

uweigand edited edge metadata.Jun 30 2016, 6:24 AM

From my understanding, gas only supports some of the possible s390 instruction formats when using ".insn". For example, the "ri" format in the insn directive corresponds to the RI-a format, which uses an immediate field and not a relative-immediate field. Therefore, for the .insn directive specifically:

ri = RI-a
rie = RIE-d
ril = RIL-a

Well, sort of ... it is true that each .insn format corresponds to one specific s390 subformat, but for the most part, you can also use .insn to encode other subformats, e.g. by passing a mask in a register field, or by setting a field to 0. Existing code using inline asm does that ...

However, there are a few subformats that indeed currently cannot be encoded, e.g. RI-b or RI-c, the RI variants with a PC-relative operand. So my earlier comment was indeed not quite correct.

Looking at those formats again:

RIE: .insn directly models RIE-e, which is only used by BRXLG/BRXHG, which LLVM does not current have (but probably should). However, we can also map the RIE-b format as long as M3 is hard-coded in the opcode (which is used by CRJ et al), or the RIE-c format as long as I2 is hard-coded (which is used by CIJ et al). We can use a test using one of those to ensure the operand encoding works correctly. [ I mentioned RxSBG, but those actually cannot be encoded since the .insn rie format always has a PC-relative operand. ]

RRF: all variants (RRF-a ... RRF-e) can be modeled (as long as can we pass a mask in a register field), so really any of the RRF instructions can be used here. Most of those only use three fields, so in order to test all four we should use one of floating-point insns like LEDBRA.

RSE: in z/Architecture those are all now RSY, really ... so to test that .insn rse works, we can use any of the RSY instructions but simply only provide a short displacement

SS: this is indeed a bit difficult ... there are many SS subformats, and the instructions we currently support in LLVM (MVC etc) use the 8-bit length field, which cannot easily be represented with the .insn ss format

zhanjunl retitled this revision from [SystemZ] Add support for .insn and .word/.short/.long/.quad directives to [SystemZ] Add support for the .insn directive..
zhanjunl updated this object.
zhanjunl edited edge metadata.

Reworked this diff to use MC instructions to parse and emit the directive. This allows handling of PC-relative operands for some of these directives. Also added some new instructions so that we can test the .insn directive for all formats.

zhanjunl marked 16 inline comments as done.Jul 26 2016, 1:49 PM

Thanks for the rework, this is looking much better now!

A couple of additional comments / requests:

Could you please split off adding those other insns like PR, MVCK, ... into a separate patch? This is really an independent change that can be reviewed separately and committed ahead of the .insn patch.

I don't really like the code duplication between the Insn / AsmInsn patterns. I can see why we need a isCodeGenOnly as well as a isAsmParserOnly pattern (so that the disassembler doesn't get confused), but those should then get generated from a multiclass template. See e.g. how this is done with the CompareBranches multiclass.

In addition, it doesn't seem a good idea to have the AsmInsn patterns recognize "instructions" like InsnE. This means that assembly files that contain the literal string InsnE will get recognized here, which is not really what we want. I'm wondering if we cannot simply also use ".insn e,..." as the string for the AsmInsn patterns as well -- if we do that, I think the auto-generated matchers should use literal string matching to detect the correct pattern to use, which means that the extra table getOpcodeFromFormat would no longer be necessary -- another duplication eliminated.

Your routine parseAnyGRFPRegister accepts integers, %rX and %fX. GAS in addition accepts %vX (for X < 16) as well.

For consistency, I think DirectiveInsnRSE should derive from (a newly created) InsnRSE, just like all the other DirectiveInsn patterns.

I don't really like the code duplication between the Insn / AsmInsn patterns. I can see why we need a isCodeGenOnly as well as a isAsmParserOnly pattern (so that the disassembler doesn't get confused), but those should then get generated from a multiclass template. See e.g. how this is done with the CompareBranches multiclass.

In addition, it doesn't seem a good idea to have the AsmInsn patterns recognize "instructions" like InsnE. This means that assembly files that contain the literal string InsnE will get recognized here, which is not really what we want. I'm wondering if we cannot simply also use ".insn e,..." as the string for the AsmInsn patterns as well -- if we do that, I think the auto-generated matchers should use literal string matching to detect the correct pattern to use, which means that the extra table getOpcodeFromFormat would no longer be necessary -- another duplication eliminated.

Following up on myself here, actually I don't think we even need the isCodeGenOnly variants at all. Simply having the isAsmParserOnly variants should suffice (we only need it for the asm parser after all ...).

Could you please split off adding those other insns like PR, MVCK, ... into a separate patch? This is really an independent change that can be reviewed separately and committed ahead of the .insn patch.

I'll split off the changes into a different diff.

I don't really like the code duplication between the Insn / AsmInsn patterns. I can see why we need a isCodeGenOnly as well as a isAsmParserOnly pattern (so that the disassembler doesn't get confused), but those should then get generated from a multiclass template. See e.g. how this is done with the CompareBranches multiclass.

In addition, it doesn't seem a good idea to have the AsmInsn patterns recognize "instructions" like InsnE. This means that assembly files that contain the literal string InsnE will get recognized here, which is not really what we want. I'm wondering if we cannot simply also use ".insn e,..." as the string for the AsmInsn patterns as well -- if we do that, I think the auto-generated matchers should use literal string matching to detect the correct pattern to use, which means that the extra table getOpcodeFromFormat would no longer be necessary -- another duplication eliminated.

I would have liked to use ".insn e" for the mnemonic as well, but from my understanding, TableGen removes all white spaces after parsing the string for tokens, so I couldn't find a way to get TableGen to generate ".insn e" as the asm mnemonic. Which is why I created two duplicated instructions. I'll try to do a bit more digging to see if there's another way around it.

Your routine parseAnyGRFPRegister accepts integers, %rX and %fX. GAS in addition accepts %vX (for X < 16) as well.

I wasn't sure whether vector registers should be recognized since there are no .insn directives for vector instructions, but I guess for GCC compatibility it makes sense. I'll fix this up.

For consistency, I think DirectiveInsnRSE should derive from (a newly created) InsnRSE, just like all the other DirectiveInsn patterns.

I'm not sure we want to have a newly created InsnRSE, since RSE isn't a valid instruction format and I don't want to confuse any future users into thinking it's a valid format to use.

In addition, it doesn't seem a good idea to have the AsmInsn patterns recognize "instructions" like InsnE. This means that assembly files that contain the literal string InsnE will get recognized here, which is not really what we want. I'm wondering if we cannot simply also use ".insn e,..." as the string for the AsmInsn patterns as well -- if we do that, I think the auto-generated matchers should use literal string matching to detect the correct pattern to use, which means that the extra table getOpcodeFromFormat would no longer be necessary -- another duplication eliminated.

I would have liked to use ".insn e" for the mnemonic as well, but from my understanding, TableGen removes all white spaces after parsing the string for tokens, so I couldn't find a way to get TableGen to generate ".insn e" as the asm mnemonic. Which is why I created two duplicated instructions. I'll try to do a bit more digging to see if there's another way around it.

Well, the mnemonic would be just ".insn", and "e" would be the first operand, decoded as fixed string. If you simply use ".insn e, ..." as an AsmParser pattern, you should see something like this in the generated SystemZGenAsmMatcher.inc file:

static const MatchEntry MatchTable0[] = {

{ 0 /* .insn */, SystemZ::AsmInsnE, Convert__U16Imm1_1, 0, { MCK_e, MCK_U16Imm }, },
{ 0 /* .insn */, SystemZ::AsmInsnRI, Convert__U32Imm1_1__AnyGRFP1_2__S16Imm1_3, 0, { MCK_ri, MCK_U32Imm, MCK_AnyGRFP, MCK_S16Imm }, },
{ 0 /* .insn */, SystemZ::AsmInsnRIE, Convert__U48Imm1_1__AnyGRFP1_2__AnyGRFP1_3__PCRel161_4, 0, { MCK_rie, MCK_U48Imm, MCK_AnyGRFP, MCK_AnyGRFP, MCK_PCRel16 }, },

where the MCK_e, MCK_ri etc are matched as string constants via matchTokenString.

For consistency, I think DirectiveInsnRSE should derive from (a newly created) InsnRSE, just like all the other DirectiveInsn patterns.

I'm not sure we want to have a newly created InsnRSE, since RSE isn't a valid instruction format and I don't want to confuse any future users into thinking it's a valid format to use.

Hmm. Maybe just use DirectiveInsnRSY then, since RSE is just the same as RSY except that only 12-bit displacements are used (which the insn pattern already enforces via addr12only).

zhanjunl added a comment.EditedAug 3 2016, 2:46 PM

Well, the mnemonic would be just ".insn", and "e" would be the first operand, decoded as fixed string. If you simply use ".insn e, ..." as an AsmParser pattern, you should see something like this in the generated SystemZGenAsmMatcher.inc file:

static const MatchEntry MatchTable0[] = {

{ 0 /* .insn */, SystemZ::AsmInsnE, Convert__U16Imm1_1, 0, { MCK_e, MCK_U16Imm }, },
{ 0 /* .insn */, SystemZ::AsmInsnRI, Convert__U32Imm1_1__AnyGRFP1_2__S16Imm1_3, 0, { MCK_ri, MCK_U32Imm, MCK_AnyGRFP, MCK_S16Imm }, },
{ 0 /* .insn */, SystemZ::AsmInsnRIE, Convert__U48Imm1_1__AnyGRFP1_2__AnyGRFP1_3__PCRel161_4, 0, { MCK_rie, MCK_U48Imm, MCK_AnyGRFP, MCK_AnyGRFP, MCK_PCRel16 }, },

where the MCK_e, MCK_ri etc are matched as string constants via matchTokenString.

Hmm, that table is used for matching the instruction after the instruction has been parsed. ParseInstruction uses the OperandMatchTable for parsing operands, which looks like this if I use the ".insn e,..." forms.

static const OperandMatchEntry OperandMatchTable[2076] = {
  { 0, 0 /* .insn */, MCK_BDAddr64Disp12, 4 /* 2 */ },
  { 0, 0 /* .insn */, MCK_AnyGRFP, 4 /* 2 */ },
  { 0, 0 /* .insn */, MCK_PCRel32, 8 /* 3 */ },
  { 0, 0 /* .insn */, MCK_AnyGRFP, 12 /* 2, 3 */ },

The code for parsing operands goes through that table and for every entry that matches the mnemonic, will try to parse the current operand based on the current entry. So for example,

.insn ri,0xa7080000,%r1,1

Based on the table, operand 3 ("1") could be a PCRel32 or a register, or an immediate. The parser will try to parse operand 3 as a PC-rel first because it's the first entry, which means that "1" could be parsed as a pc-rel operand, when it should have been an immediate. The same goes for immediate vs register, immediate vs address, etc. I'm not sure how to pick the right operand parser routine, and I'm not sure it's even possible using ParseInstruction based on the way the table is set up.

Unless there's something I'm missing, the only solution I could come up with is to write a custom parser that parses everything for the .insn directive, and use MatchTable0 to make sure that the operands are correct, but there would be some overlap in code with the other custom parser routines.

Hmm, that table is used for matching the instruction after the instruction has been parsed. ParseInstruction uses the OperandMatchTable for parsing operands, which looks like this if I use the ".insn e,..." forms.

Ah, I completely forgot about this two-step process. Yes, this is quite annyoing and would not work as I thought.

Unless there's something I'm missing, the only solution I could come up with is to write a custom parser that parses everything for the .insn directive, and use MatchTable0 to make sure that the operands are correct, but there would be some overlap in code with the other custom parser routines.

I don't see any really clever option here either. Probably the best way forward would be to go back to a custom operand parsing, based on a table similar to your original approach, where for each format you have a list of operand parser routines you call in sequence to collect all the operands. These routines should be the same that would also be called from the auto-generated operand parser, so e.g. for InsnRIL you'd call parseAnyGRFP and then parsePCRel32.

For code generation you'd then use the Insn* patterns (marked isCodeGenOnly) of your current approach.

This is still a bit annoying since the operand info is now still duplicated between the patterns and your special table. But I guess that's the best we can for now without major reworks to TableGen.

zhanjunl retitled this revision from [SystemZ] Add support for the .insn directive. to [SystemZ] Add support for the .insn directive.
zhanjunl edited edge metadata.

Created a custom table with operand information to parse the different formats, which calls the same parse* routines that are used by TableGen generated code.
Removed the AsmInsn* variants and allow vector registers to be recognized when parsing registers for the insn directive.

uweigand accepted this revision.Aug 5 2016, 2:28 PM
uweigand edited edge metadata.

Thanks for working through all those changes. This version looks good to me.

This revision was automatically updated to reflect the committed changes.
rogfer01 added inline comments.
llvm/trunk/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
512–520

This fails to build in Windows with Visual Studio 2013 with debugging enabled because of some debugging macro disaster in the std::equal_range used below.

Do you mind if I add this extra operator() like below?

  bool operator() (const InsnMatchEntry &LHS, const InsnMatchEntry &RHS) {
	  return LHS.Format < RHS.Format;
  }

I know it looks silly but this fails to build from source otherwise when debugging is enabled.

Thank you

zhanjunl added inline comments.Aug 10 2016, 8:50 AM
llvm/trunk/lib/Target/SystemZ/AsmParser/SystemZAsmParser.cpp
512–520

Sure, I don't see anything wrong with that. Sorry for the build break.