Add support for the .insn directive.
.insn is an s390 specific directive that allows encoding of an instruction
instead of using a mnemonic. The motivating case is some code in node.js that
requires support for the .insn directive.
Differential D21809
[SystemZ] Add support for the .insn directive • zhanjunl on Jun 28 2016, 11:46 AM. Authored by
Details Add support for the .insn directive. .insn is an s390 specific directive that allows encoding of an instruction
Diff Detail
Event Timeline
Comment Actions See inline comments. Also, this might not be the good level to have .insn handling - for one, we forfeit any hope of handling fixups in immediate operands (which is necessary to handle pc-relative forms correctly). It might be better to represent these directives as special machine instructions, one per form. Comment Actions Oh, and the .word change should be split off (it'll likely be a one-liner using the alias functionality)
Comment Actions From my understanding, gas only supports some of the possible s390 instruction formats when using ".insn". For example, the "ri" format in the insn directive corresponds to the RI-a format, which uses an immediate field and not a relative-immediate field. Therefore, for the .insn directive specifically: ri = RI-a I tried assembling a sample .insn directive with gas: .text .globl main .type main,@function main: lghi %r2, 0 # brct %r2,.label .insn ri,0xa7060000,%r2,.label .label: br %r14 .Lfunc_end0: .size main, .Lfunc_end0-main gas will assemble the above code with no errors, but when I do an objdump of the object file, I see: temp.o: file format elf64-s390 Disassembly of section .text: 0000000000000000 <main>: 0: a7 29 00 00 lghi %r2,0 4: a7 26 00 00 brct %r2,4 <main+0x4> 0000000000000008 <.label>: 8: 07 fe br %r14 a: 07 07 nopr %r7 So it looks like it was able to get the correct instruction mnemonic, but it silently fails when trying to encode the label into the instruction. This was my understanding when supporting the .insn, and so I didn't expect to have to do any fixups. Unless I'm mistaken? Comment Actions Once I figure out if we're sticking with this implementation, I'll continue working on this and fix up all the comments that have been made. Thanks for the quick and thorough review guys!
Comment Actions I was refering to RIE, RIL, RSI with my comments about pc-relative operands, not RI. Here goes (gas): nopr %r7 .insn rie, 0xec0000000044, 1, 2, label nopr %r7 label: Disassembly of section .text: 0000000000000000 <label-0xa>: 0: 07 07 nopr %r7 2: ec 12 00 04 00 44 brxhg %r1,%r2,a <label> 8: 07 07 nopr %r7 000000000000000a <label>: a: 07 07 nopr %r7 So this is definitely supported.
Again, this is RI, which gas considers to be RI-a, which is not PC-relative. Also, please disassemble your code with objdump -dr: Disassembly of section .text: 0000000000000000 <main>: 0: a7 29 00 00 lghi %r2,0 4: a7 26 00 00 brct %r2,4 <main+0x4> 6: R_390_16 .text+0x8 0000000000000008 <.label>: 8: 07 fe br %r14 a: 07 07 nopr %r7 Here goes the fixup. Comment Actions
Well, sort of ... it is true that each .insn format corresponds to one specific s390 subformat, but for the most part, you can also use .insn to encode other subformats, e.g. by passing a mask in a register field, or by setting a field to 0. Existing code using inline asm does that ... However, there are a few subformats that indeed currently cannot be encoded, e.g. RI-b or RI-c, the RI variants with a PC-relative operand. So my earlier comment was indeed not quite correct. Looking at those formats again: RIE: .insn directly models RIE-e, which is only used by BRXLG/BRXHG, which LLVM does not current have (but probably should). However, we can also map the RIE-b format as long as M3 is hard-coded in the opcode (which is used by CRJ et al), or the RIE-c format as long as I2 is hard-coded (which is used by CIJ et al). We can use a test using one of those to ensure the operand encoding works correctly. [ I mentioned RxSBG, but those actually cannot be encoded since the .insn rie format always has a PC-relative operand. ] RRF: all variants (RRF-a ... RRF-e) can be modeled (as long as can we pass a mask in a register field), so really any of the RRF instructions can be used here. Most of those only use three fields, so in order to test all four we should use one of floating-point insns like LEDBRA. RSE: in z/Architecture those are all now RSY, really ... so to test that .insn rse works, we can use any of the RSY instructions but simply only provide a short displacement SS: this is indeed a bit difficult ... there are many SS subformats, and the instructions we currently support in LLVM (MVC etc) use the 8-bit length field, which cannot easily be represented with the .insn ss format Comment Actions Reworked this diff to use MC instructions to parse and emit the directive. This allows handling of PC-relative operands for some of these directives. Also added some new instructions so that we can test the .insn directive for all formats. Comment Actions Thanks for the rework, this is looking much better now! A couple of additional comments / requests: Could you please split off adding those other insns like PR, MVCK, ... into a separate patch? This is really an independent change that can be reviewed separately and committed ahead of the .insn patch. I don't really like the code duplication between the Insn / AsmInsn patterns. I can see why we need a isCodeGenOnly as well as a isAsmParserOnly pattern (so that the disassembler doesn't get confused), but those should then get generated from a multiclass template. See e.g. how this is done with the CompareBranches multiclass. In addition, it doesn't seem a good idea to have the AsmInsn patterns recognize "instructions" like InsnE. This means that assembly files that contain the literal string InsnE will get recognized here, which is not really what we want. I'm wondering if we cannot simply also use ".insn e,..." as the string for the AsmInsn patterns as well -- if we do that, I think the auto-generated matchers should use literal string matching to detect the correct pattern to use, which means that the extra table getOpcodeFromFormat would no longer be necessary -- another duplication eliminated. Your routine parseAnyGRFPRegister accepts integers, %rX and %fX. GAS in addition accepts %vX (for X < 16) as well. For consistency, I think DirectiveInsnRSE should derive from (a newly created) InsnRSE, just like all the other DirectiveInsn patterns. Comment Actions Following up on myself here, actually I don't think we even need the isCodeGenOnly variants at all. Simply having the isAsmParserOnly variants should suffice (we only need it for the asm parser after all ...). Comment Actions I'll split off the changes into a different diff.
I would have liked to use ".insn e" for the mnemonic as well, but from my understanding, TableGen removes all white spaces after parsing the string for tokens, so I couldn't find a way to get TableGen to generate ".insn e" as the asm mnemonic. Which is why I created two duplicated instructions. I'll try to do a bit more digging to see if there's another way around it.
I wasn't sure whether vector registers should be recognized since there are no .insn directives for vector instructions, but I guess for GCC compatibility it makes sense. I'll fix this up.
I'm not sure we want to have a newly created InsnRSE, since RSE isn't a valid instruction format and I don't want to confuse any future users into thinking it's a valid format to use. Comment Actions Well, the mnemonic would be just ".insn", and "e" would be the first operand, decoded as fixed string. If you simply use ".insn e, ..." as an AsmParser pattern, you should see something like this in the generated SystemZGenAsmMatcher.inc file: static const MatchEntry MatchTable0[] = { { 0 /* .insn */, SystemZ::AsmInsnE, Convert__U16Imm1_1, 0, { MCK_e, MCK_U16Imm }, }, { 0 /* .insn */, SystemZ::AsmInsnRI, Convert__U32Imm1_1__AnyGRFP1_2__S16Imm1_3, 0, { MCK_ri, MCK_U32Imm, MCK_AnyGRFP, MCK_S16Imm }, }, { 0 /* .insn */, SystemZ::AsmInsnRIE, Convert__U48Imm1_1__AnyGRFP1_2__AnyGRFP1_3__PCRel161_4, 0, { MCK_rie, MCK_U48Imm, MCK_AnyGRFP, MCK_AnyGRFP, MCK_PCRel16 }, }, where the MCK_e, MCK_ri etc are matched as string constants via matchTokenString.
Hmm. Maybe just use DirectiveInsnRSY then, since RSE is just the same as RSY except that only 12-bit displacements are used (which the insn pattern already enforces via addr12only). Comment Actions Hmm, that table is used for matching the instruction after the instruction has been parsed. ParseInstruction uses the OperandMatchTable for parsing operands, which looks like this if I use the ".insn e,..." forms. static const OperandMatchEntry OperandMatchTable[2076] = { { 0, 0 /* .insn */, MCK_BDAddr64Disp12, 4 /* 2 */ }, { 0, 0 /* .insn */, MCK_AnyGRFP, 4 /* 2 */ }, { 0, 0 /* .insn */, MCK_PCRel32, 8 /* 3 */ }, { 0, 0 /* .insn */, MCK_AnyGRFP, 12 /* 2, 3 */ }, The code for parsing operands goes through that table and for every entry that matches the mnemonic, will try to parse the current operand based on the current entry. So for example, .insn ri,0xa7080000,%r1,1 Based on the table, operand 3 ("1") could be a PCRel32 or a register, or an immediate. The parser will try to parse operand 3 as a PC-rel first because it's the first entry, which means that "1" could be parsed as a pc-rel operand, when it should have been an immediate. The same goes for immediate vs register, immediate vs address, etc. I'm not sure how to pick the right operand parser routine, and I'm not sure it's even possible using ParseInstruction based on the way the table is set up. Unless there's something I'm missing, the only solution I could come up with is to write a custom parser that parses everything for the .insn directive, and use MatchTable0 to make sure that the operands are correct, but there would be some overlap in code with the other custom parser routines. Comment Actions Ah, I completely forgot about this two-step process. Yes, this is quite annyoing and would not work as I thought.
I don't see any really clever option here either. Probably the best way forward would be to go back to a custom operand parsing, based on a table similar to your original approach, where for each format you have a list of operand parser routines you call in sequence to collect all the operands. These routines should be the same that would also be called from the auto-generated operand parser, so e.g. for InsnRIL you'd call parseAnyGRFP and then parsePCRel32. For code generation you'd then use the Insn* patterns (marked isCodeGenOnly) of your current approach. This is still a bit annoying since the operand info is now still duplicated between the patterns and your special table. But I guess that's the best we can for now without major reworks to TableGen. Comment Actions Created a custom table with operand information to parse the different formats, which calls the same parse* routines that are used by TableGen generated code.
|
This fails to build in Windows with Visual Studio 2013 with debugging enabled because of some debugging macro disaster in the std::equal_range used below.
Do you mind if I add this extra operator() like below?
I know it looks silly but this fails to build from source otherwise when debugging is enabled.
Thank you