This is an archive of the discontinued LLVM Phabricator instance.

[MIR] Target specific MIR formating and parsing
ClosedPublic

Authored by pguo on Nov 4 2019, 6:42 PM.

Details

Summary

Added MIRFormatter for target specific MIR formating and parsing with
immediate and custom pseudo source values. Target machine can subclass
MIRFormatter and implement custom logic for printing and parsing
immediate and custom pseudo source values for better readability.

  • Target specific immediate mnemonic need to start with "." follows by identifier string. When MIR parser sees immediate it will call target specific parsing function.
  • Custom pseudo source value need to start with custom follows by double-quoted string. MIR parser will pass the quoted string to target specific PSV parsing function.
  • MIRFormatter have 2 helper functions to facilitate LLVM value printing and parsing for custom PSV if they refers LLVM values.

Event Timeline

pguo created this revision.Nov 4 2019, 6:42 PM
Herald added a project: Restricted Project. · View Herald TranscriptNov 4 2019, 6:42 PM

Could use some tests for the various parse error conditions

llvm/include/llvm/CodeGen/MIRFormatter.h
2

Should pad this out to be the same length as the other lines

59

Typo Implemente

llvm/lib/CodeGen/MIRParser/MIParser.cpp
1536

Capitalize and period

2613–2617

Should this go in the PerTargetMIParsingState?

3220

Initialize to null at the function entry and remove the = nullptrs throughout the function?

3231

Extra blank lines between the switch cases

3243

Capitalize

3260

Typo unse

llvm/lib/CodeGen/MachineOperand.cpp
484–488

I don't see the point of adding this function? This probably shouldn't be leaking the MIRPrinter into MachineOperand.cpp

1169

Single quotes

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.struct.buffer.store.ll
114

I'm surprised to see these in a quoted form? Is that just the default behavior without implementing something?

Thanks for doing this. It should make a lot of things easier to read.

In addition to Matt's comments, I have a few other minor comments.

llvm/include/llvm/CodeGen/MIRFormatter.h
37

This is just an optional nitpick. Using ~0 is fine but if you used Optional<unsigned> then you wouldn't have to explain that. It does take a little more storage though so I don't mind if you stick to ~0

74

Typo? MMIR

llvm/include/llvm/CodeGen/MachineOperand.h
292

Capitalization Nitpick: whether -> Whether.
There's a few other cases of this kind of thing throughout the patch

293

Can we avoid the 'Sometimes' in that sentence? It makes it sound like a bug or a random chance but I think you intended it to mean that 'def' is definitely not printed when that condition is met.

pguo updated this revision to Diff 229196.Nov 13 2019, 3:49 PM
pguo marked 3 inline comments as done.
  • Fix based on code review.
pguo marked 11 inline comments as done.Nov 13 2019, 4:03 PM
pguo added inline comments.
llvm/include/llvm/CodeGen/MachineOperand.h
292

These are from old commits and I tend not to change them and just follow the convention.

293

The "sometimes" are from old commits, sorry I made some formatting changes in previous commit and now I revert them.

llvm/lib/CodeGen/MIRParser/MIParser.cpp
2613–2617

The parseTargetImmMnemonic will access current Token and call lex(), so IMO it's better to put it in MIParser class.

llvm/lib/CodeGen/MachineOperand.cpp
484–488

The idea is a helper function to print IR value in MIR serialization format, which can be used by target custom pseudo source value. The target custom pseudo source value can hold IR value reference, it's better to print/parse it in a standard way.

1169

Do you mean back quote here? I intentionally use double quotes here for custom pseudo source value, since back quote ... is used for quoted IR value, which could be part of the serialization output. The design is custom pseudo source value is wrapped in double quoted string which will be passed to target specific formatter. Inside the string it can have back quoted IR value as well as other customized text.

llvm/test/CodeGen/AMDGPU/GlobalISel/regbankselect-amdgcn.struct.buffer.store.ll
114

Yes. The quote is used to find the start and end of a custom pseudo source value string representation, and pass the whole string to target specific parser. The back quote is used for IR const value (QuotedIRValue in MIR lexer), so I choose double quote which will be treated as string constant by MIR lexer.

pguo marked an inline comment as done.Nov 13 2019, 4:22 PM
pguo added inline comments.
llvm/include/llvm/CodeGen/MIRFormatter.h
37

I think the OpIdx here is not optional but required to get the correct functionally. The tuple <OpCode, OpIdx> will uniquely tell formatter how to format the immediate in a target specific way. While since we can do dump() on any MachineOperand object inside debugger, in this case we don't know the Index, so we have to pass ~0 to the real print function to mark it as unknown.

dsanders added inline comments.Nov 13 2019, 5:18 PM
llvm/lib/CodeGen/MachineOperand.cpp
1169

I believe Matt means that it's a single character string so he's asking for '\"'

pguo updated this revision to Diff 229239.Nov 13 2019, 11:35 PM
  • Using single quote when outputing char.
pguo marked 2 inline comments as done.Nov 13 2019, 11:37 PM
pguo added inline comments.
llvm/lib/CodeGen/MachineOperand.cpp
1169

Thank you Daniel for clarification.

arsenm added inline comments.Nov 13 2019, 11:52 PM
llvm/include/llvm/CodeGen/TargetInstrInfo.h
1795–1799

This requires the target implementation to recreate the null check and construct pattern Make this non-virtual and have a virtual getMIRFormatterImpl ?

arsenm added inline comments.Nov 14 2019, 12:05 AM
llvm/include/llvm/CodeGen/TargetInstrInfo.h
1795–1799

I also just realized this should really go in the TargetMachine, not TargeInstrInfo, similar to the target specific YAML functions. The name should also be createMIRFormatter for consistency

pguo updated this revision to Diff 229427.Nov 14 2019, 4:54 PM
pguo marked an inline comment as done.
  • Move MIRFormatter to TargetMachine

Move MIRFormatter from TargetInstrInfo to TargetMachine.
Move slot to value map from MIParser to PerFunctionMIParsingState.
Some code refactoring.

pguo marked 2 inline comments as done.Nov 14 2019, 4:59 PM
pguo added inline comments.
llvm/include/llvm/CodeGen/TargetInstrInfo.h
1795–1799

Thank you for pointing me the target specific YAML functions, I didn't aware of them before. Yes, I agree these target specific MIR serialization/de-serialization functions should be put together. I moved to TargetMachine, while I still use getMIRFormatter and get rid of the lazy creation (since I think this object is pretty light). Maybe we should move those target specific YAML functions into MIRFormatter?

pguo marked an inline comment as done.Nov 20 2019, 4:51 PM

Ping.

dsanders accepted this revision.Dec 3 2019, 1:27 PM

LGTM from me.

llvm/include/llvm/CodeGen/MIRFormatter.h
37

I think there may be some confusion here so just to clarify: Optional<unsigned> is essentially the same as what you're doing but the unknown case is more explicit and self documenting as it doesn't need a number to be carved out of unsigned and documented as being 'unknown':

printImm(OS, MI, ~0 /* Unknown */, Imm)

becomes:

printImm(OS, MI, None, Imm)
This revision is now accepted and ready to land.Dec 3 2019, 1:27 PM
pguo updated this revision to Diff 232228.Dec 4 2019, 3:34 PM
  • [MIR] Target specific MIR formatting and parsing

Use Optional for OpIdx argument.

pguo marked 2 inline comments as done.Dec 4 2019, 3:35 PM
pguo added inline comments.
llvm/include/llvm/CodeGen/MIRFormatter.h
37

Got it. I switched to use Optional. Thanks.

pguo marked an inline comment as done.Dec 4 2019, 3:49 PM

@arsenm, any comments on recent changes on moving the MIR formatter to TargetMachine? Thank you.

I'm just going through the remaining issues from Matt and I don't think there's anything that should block committing this. AFAICT there's only one issue still open and it's the one about reusing the static printIRValueReference() in MachineOperand.cpp causing one function of MIRFormatter to be implemented in that file as well.

llvm/lib/CodeGen/MachineOperand.cpp
484–488

I agree with printing/parsing it the same way but it is a bit odd to have part of the MIRFormatter implementation in this file. Is there somewhere that printIRValueReference() could be to make it available for both MachineOperand.cpp and a newly created MIRFormatter.cpp?

If not, I don't think it's a huge problem to have this function here. Both objects are implemented in the same library and I've seen this kind of thing elsewhere with some of the reader/writer implementations.

pguo updated this revision to Diff 234415.Dec 17 2019, 4:49 PM
  • Move MIRFormatter::printIRValue to MIRPrinter.cpp
pguo marked 2 inline comments as done.Dec 17 2019, 4:52 PM
pguo added inline comments.
llvm/lib/CodeGen/MachineOperand.cpp
484–488

Moved this function to MIRPrinter.cpp which to me seems like a better location.

pguo marked an inline comment as done.Jan 7 2020, 5:40 PM

Would someone commit this for me, I don't have commit access yet. Thanks.

This revision was automatically updated to reflect the committed changes.

Hmm, cause the sanity failure.

/home/qshanz/work/commit/build/bin/llc -mtriple=thumbv7-none-linux-gnueabi -debug -o /dev/null < /home/qshanz/work/commit/llvm/test/CodeGen/ARM/gpr-paired-spill-thumbinst.ll 2>&1 | /home/qshanz/work/commit/build/bin/FileCheck /home/qshanz/work/commit/llvm/test/CodeGen/ARM/gpr-paired-spill-thumbinst.ll
/home/qshanz/work/commit/llvm/test/CodeGen/ARM/gpr-paired-spill-thumbinst.ll:19:10: error: CHECK: expected string not found in input
; CHECK: t2STRDi8
         ^
<stdin>:1:1: note: scanning from here
Args: /home/qshanz/work/commit/build/bin/llc -mtriple=thumbv7-none-linux-gnueabi -debug -o /dev/null
^
<stdin>:21:25: note: possible intended match here
.. opcode 39 is aliased to 38
thakis added a comment.Jan 8 2020, 7:54 PM

Temporarily reverted in 71d64f72f934631aa2f for now.

Temporarily reverted in 71d64f72f934631aa2f for now.

Recommitted with a fix in de3d0ee023c. There was an unguarded nullptr dereference in a function that allows nullptr

kadircet added inline comments.
llvm/lib/Target/TargetMachine.cpp
15

this seems to be causing a layering violation, as codegen already depends on Target. Is it possible to restructure this change in a way to prevent this cycle?
This doesn't brake cmake builds but is causing problems in strict build systems.

pguo marked 2 inline comments as done.Jan 10 2020, 10:23 AM
pguo added inline comments.
llvm/lib/Target/TargetMachine.cpp
15
dsanders added inline comments.
llvm/lib/Target/TargetMachine.cpp
15

Peng was working on fixing that yesterday. It looks like it landed in https://reviews.llvm.org/rGcfd849840134 (Thanks @bkramer for committing it).