This is an archive of the discontinued LLVM Phabricator instance.

[mlir][LLVMIR] Add 'llvm.switch' op
ClosedPublic

Authored by modocache on Dec 9 2020, 11:52 PM.

Details

Summary

The LLVM IR 'switch' instruction allows control flow to be transferred
to one of any number of branches depending on an integer control value,
or a default value if the control does not match any branch values. This patch
adds llvm.switch to the MLIR LLVMIR dialect, as well as translation routines
for lowering it to LLVM IR.

To store a variable number of operands for a variable number of branch
destinations, the new op makes use of the AttrSizedOperandSegments
trait. It stores its default branch operands as one segment, and all
remaining case branches' operands as another. It also stores pairs of
begin and end offset values to delineate the sub-range of each case branch's
operands. There's probably a better way to implement this, since the
offset computation complicates several parts of the op definition. This is the
approach I settled on because in doing so I was able to delegate to the default
op builder member functions. However, it may be preferable to instead specify
skipDefaultBuilders in the op's ODS, or use a completely separate
approach; feedback is welcome!

Another contentious part of this patch may be the custom printer and
parser functions for the op. Ideally I would have liked the MLIR to be
printed in this way:

llvm.switch %0, ^bb1(%1 : !llvm.i32) [
  1: ^bb2,
  2: ^bb3(%2, %3 : !llvm.i32, !llvm.i32)
]

The above would resemble how LLVM IR is formatted for the 'switch'
instruction. But I found it difficult to print and parse something like
this, whether I used the declarative assembly format or custom functions.
I also was not sure a multi-line format would be welcome -- it seems
like most MLIR ops do not use newlines. Again, I'd be happy to hear any
feedback here as well, or on any other aspect of the patch.

Diff Detail

Event Timeline

modocache created this revision.Dec 9 2020, 11:52 PM
modocache requested review of this revision.Dec 9 2020, 11:52 PM
ftynse requested changes to this revision.Dec 10 2020, 3:19 AM

Thanks, this look good already, but I have a set of comments.

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
603

Let's mention caseValues here as well.

604–605

Let's have a consistent naming scheme for attributes, please! Check what the rest of this file uses predominantly.

614

Nit: caseDestinations.empty()

616

Do we also want to check that the number of branch weights matches the number of successors?

630–660

This should live in a C++ file

631

No need to prefix SmallVector with llvm::, it is re-exported into the mlir:: namespace

650–652

Can't we rather accept an ArrayRef<int32_t> and avoid casting here?

Also, llvm::to_vector.

673–677

This attribute looks redundant with operand_segment_sizes. Furthermore, it is only used in the parser, but ignored in the builder.

682

Nit: you can return pair<0,0> or any other empty range and drop Optional.

686

Same as above.

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
281–282

Nit: I'd rather assert that index is in bounds

289

This looks like the pair is "begin, length" and not "begin, end" as the documentation claims.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
733

Nit: please only use auto when it improves readability, i.e. when the type is obvious from RHS (a cast) or is very long/impossible to spell out (iterators, lambdas).

739–740

I'm pretty sure DenseIntElementsAttr has a way of obtaining the value without going through (the creation under lock of) IntegerAttr, maybe through iterators.

750

If it has to be i32 in LLVM IR, we should add a corresponding check to the op verifier.

This revision now requires changes to proceed.Dec 10 2020, 3:19 AM
rriddle added inline comments.Dec 10 2020, 12:35 PM
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
196

I think you could represent this with a mix of declarative assembly format and C++:

let assemblyFormat = [{
  $value `,` $defaultDestination (`(` $defaultOperands^ `)`)?
  `[` custom<SwitchOpCases>($caseDestinations, $caseOperands, type($caseOperands), $caseOperandOffsets) `]`
  attr-dict
}];
static ParseResult parseSwitchOpCases(OpAsmParser &parser,  SmallVectorImpl<Block *> &caseDestinations,
                                                                    SmallVectorImpl<OpAsmParser::OperandType> &caseOperands, 
                                                                    SmallVectorImpl<Type> &caseOperandTypes, ElementsAttr &caseOperandOffsets) {
  // Do the parsing of the following here in C++:
  //   1: ^bb2,
  //   2: ^bb3(%2, %3 : !llvm.i32, !llvm.i32)
}
static void printSwitchOpCases(OpAsmPrinter &printer, SwitchOp op, SuccessorRange caseDestinations, OperandRange caseOperands, TypeRange caseOperandTypes, 
                                                       ElementsAttr caseOperandOffsets) {
  // Do the printing of the following here in C++:
  //   1: ^bb2,
  //   2: ^bb3(%2, %3 : !llvm.i32, !llvm.i32)
}

The above might need slight tweaking given that I haven't tried to compile it.

For the newlines, I've had the thought for a while to allow operation parser to add newlines that get indented to the beginning of the operation line. I can add this in parallel to your change, and then you could pick it up when it lands.

modocache updated this revision to Diff 311001.Dec 10 2020, 1:09 PM
modocache marked 13 inline comments as done.

Thanks for the review, @ftynse!

I noticed I didn't explicitly call this out in my commit message, so I'll do so now: with this patch as-is, the llvm.switch MLIR op has stricter constraints that the LLVM IR switch instruction: the LLVM IR switch can be used with no case values, whereas MLIR llvm.switch currently cannot. The following LLVM IR switch behaves the same way as an unconditional br %default instruction would:

switch i32 %0, label %default [
]

However, with this patch as-is, we reject the equivalent MLIR, an llvm.switch op with no case values and no case destinations:

llvm.switch %0, ^bb1, [], []

It's not difficult to modify this patch to allow for a case-less llvm.switch MLIR op, but I wasn't sure it was a good idea to do so. It seems to me that any reasonable frontend could avoid emitting llvm.switch %0, ^bb1, [], [] and instead emit llvm.br ^bb1. So, for now, case-less llvm.switch MLIR ops are rejected, but if anyone disagrees, please let me know.

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
673–677

Am I understanding you right that you're saying I ought to remove case_operand_offsets, and instead use operand_segment_sizes? I ask because case_operand_offsets *is* used in the builder, I think: it's passed along to a default build member function provided by MLIR TableGen. So I wanted to make sure.

I touched on this in my commit message:

To store a variable number of operands for a variable number of branch destinations, the new op makes use of the AttrSizedOperandSegments trait. It stores its default branch operands as one segment, and all remaining case branches' operands as another. It also stores pairs of begin and end offset values to delineate the sub-range of each case branch's operands. There's probably a better way to implement this, since the offset computation complicates several parts of the op definition. This is the approach I settled on because in doing so I was able to delegate to the default op builder member functions. However, it may be preferable to instead specify skipDefaultBuilders in the op's ODS, or use a completely separate approach; feedback is welcome!

I think it should be possible to rewrite this patch to use operand_segment_sizes instead of case_operand_offsets, but I want to make sure that's what you're requesting before I do, because you also commented on some of the offset calculation logic. All of that code will probably go away if I rewrite this to use operand_segment_sizes, I think -- I appreciate the tips all the same, but just thought I'd mention that in case you were envisioning a patch that removes the case_operand_offsets attribute but still calculates offsets.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
733

Will do, thanks! Based on your advice I left this alone: the type is a little long, llvm::Optional<mlir::ElementsAttr>, and the CondBrOp branch immediately above this also uses auto weights = ... for the same type, so auto here seemed consistent. That being said I'd be happy to change just this auto, or both of them, just let me know which you'd prefer.

I did change many other uses of auto in this patch to use the specific type name instead, and I agree it improves readability, so I appreciate the suggestion! (For example, for (int32_t caseValue : caseValues) ... instead of auto, in SwitchOp::build.)

750

I had thought that was taken care of by marking the argument as let arguments = (ins LLVM_i32:$value, ...); in the ODS for SwitchOp, but I may be mistaken. When I attempt to create an llvm.switch with an i1 value operand, I get:

error: 'llvm.switch' op operand #0 must be LLVM 32-bit integer type, but got '!llvm.i1'

I hadn't noticed that mlir/test/Dialect/LLVMIR/invalid.mlir tests these invalid cases. I added a test there that demonstrates the existing verification, please let me know if I should add anything to the verifier in addition.

Thanks for the review, @ftynse!

I noticed I didn't explicitly call this out in my commit message, so I'll do so now: with this patch as-is, the llvm.switch MLIR op has stricter constraints that the LLVM IR switch instruction: the LLVM IR switch can be used with no case values, whereas MLIR llvm.switch currently cannot. The following LLVM IR switch behaves the same way as an unconditional br %default instruction would:

switch i32 %0, label %default [
]

However, with this patch as-is, we reject the equivalent MLIR, an llvm.switch op with no case values and no case destinations:

llvm.switch %0, ^bb1, [], []

It's not difficult to modify this patch to allow for a case-less llvm.switch MLIR op, but I wasn't sure it was a good idea to do so. It seems to me that any reasonable frontend could avoid emitting llvm.switch %0, ^bb1, [], [] and instead emit llvm.br ^bb1. So, for now, case-less llvm.switch MLIR ops are rejected, but if anyone disagrees, please let me know.

If we do decide to go this direction, any variations from the LLVM operation should be documented in the description of the op. I'll defer to @ftynse for their preference here.

modocache added inline comments.Dec 10 2020, 1:13 PM
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
196

Oh, cool! I hadn't realized that custom C++ could be mixed into the declarative assembly format. I'll give that a try, thanks!

For the newlines, I've had the thought for a while to allow operation parser to add newlines that get indented to the beginning of the operation line. I can add this in parallel to your change, and then you could pick it up when it lands.

That sounds great, thanks! With your change I'd probably change the assembly format to include newlines.

modocache planned changes to this revision.Dec 11 2020, 8:12 AM
modocache added inline comments.
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
673–677

Oooh, I see what you're saying now. I had thought that using operand_segment_sizes would obviate the need for this kind of offset calculation, but writing the patch now I can see that's not the case. Sorry about that, and thanks for the suggestion! I'll upload a better version of this patch in a few hours.

modocache requested review of this revision.Dec 11 2020, 10:20 AM
modocache added inline comments.
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
673–677

Actually, sorry: I'm no longer sure operand_segment_sizes will work here. In OpOperandAdaptorEmitter::addVerification, a verify function is built for ops with the AttrSizedOperandSegments trait, which verifies that the number of operands matches the number of segment sizes. However, in the ODS for this patch, I specified the LLVM_SwitchOp with the following operands:

let arguments = (ins LLVM_i32:$value, // operand #1
                 Variadic<AnyType>:$defaultOperands, // #2
                 Variadic<AnyType>:$caseOperands, // #3
                 ArrayAttr:$case_values,
                 OptionalAttr<ElementsAttr>:$branch_weights);

So the SwitchOpAdaptor::verify member function is TableGen'ed to verify that the operand_segment_sizes is composed of 3 values. This doesn't match up with how Variadic<AnyType>:$caseOperands is being actually being used for the switch op: as a range of operand ranges. Whether I choose to amend this patch to add each case branch operand as an operand of the switch op, or each range of case operands as an operand, if the total number of switch operands isn't 3, the op will fail to verify:

llvm.switch %0, ^bb1, [1 : i32, 2 : i32, 3 : i32] [
  ^bb2(%1, %2: !llvm.i32, !llvm.i32),
  ^bb3(%3, %4, %5 : !llvm.i32, !llvm.i32, !llvm.i32)
  ^bb3(%6 : !llvm.i32)
]
// If each case operand is added as an operand of the switch op:
// error: 'operand_segment_sizes' attribute for specifying operand segments must have 3 elements, not 6
// If each range of case operands is added as an operand of the switch op:
// error: 'operand_segment_sizes' attribute for specifying operand segments must have 3 elements, not 4

I think the underlying mismatch here is that AttrSizedOperandSegments is intended to define the segment size of a compile-time-fixed number of variadic operands, whereas this switch op models a variable number of variadic operands. I'm not super knowledgeable about ODS, but I'm guessing that if I stick to using ODS, it's not possible to define an op using AttrSizedOperandSegments that has a variable, dynamic number of segments...?

I could look into defining an op outside of ODS, but I wonder if this inelegant case_operand_offsets approach is a simpler solution for my use case. As far as I can tell, adding a new trait for Variadic<Variadic<AnyType>> could be a large addition, whereas I'm looking to simply lower a C switch statement in my C MLIR to LLVM.

I'm adding an unrelated change to the AttrSizedOperandSegments verifier that prints how many elements it found in operand_segment_sizes. It's not necessarily part of my patch, but it helped me when debugging. I'd be happy to split it out into its own if anyone would prefer that.

Oops, forgot to clang-format.

Add to the test for operand segment sizes.

modocache added inline comments.Dec 11 2020, 12:55 PM
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
196

Is it possible, using this technique, to add an attribute to the SwitchOp from within the parseSwitchOpCases function? I can see that parsing attributes from within custom assembly format printer/parsers is possible thanks to the FormatCustomDirectiveAttributes test case in mlir/test/lib/Dialect/Test/TestOps.td. But that test uses OpAsmParser::parseAttribute to add an attribute to its op, whereas in this case, I need to collect all the case values, and then add an ArrayAttr to the switch op. From what I can tell, the custom parser functions are not passed an OperationState &, so they're not able to add attributes to it directly.

As a result, I'm not sure it's possible to parse case values in the way you described. 9Maybe the custom parser functions could be passed a reference to the state?)

rriddle added inline comments.Dec 11 2020, 1:04 PM
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
619

This variable should be camelCase, same below.

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
313

Can you just merge this function into getCaseOperandsMutable? getCaseOperands can just call the mutable version.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
342

Can you just use std::adjacent_find on all of the successors of terminator instead? That would cover all possible terminator types.

733

nit: Merge this variable into the if below:

if (auto weights = switchOp.branch_weights())
mlir/tools/mlir-tblgen/OpDefinitionsGen.cpp
2214 ↗(On Diff #311281)

nit: I'd change this to must have {1} elements, but got " as that better matches some of the other error messages we have.

Also, can you split this into a different change?

rriddle added inline comments.Dec 11 2020, 1:06 PM
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
196

The parse* function for the custom directive will be provided whatever elements you specify, so you can just add $case_values to that list. See here for the proper docs: https://mlir.llvm.org/docs/OpDefinitions/#custom-directives

rriddle added inline comments.Dec 11 2020, 1:09 PM
mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
604

Can you change this to an ElementsAttr instead? It's significantly more efficient for lists of integers.

modocache updated this revision to Diff 311306.Dec 11 2020, 1:16 PM

So, unfortunately I don't think it's feasible to use the declarative assembly format for this op, at least not without making some TableGen modifications to pass custom directives a reference to the OperationState. I'm also not certain it's possible to get rid of case_operand_offsets in favor of operand_segment_sizes, since this op seems unique in storing a variable list of variadic operands. But, aside from those two points, I think I've addressed all of your review comments so far. Thank you, @ftynse and @rriddle! Please take another look at this when you have some time.

So, unfortunately I don't think it's feasible to use the declarative assembly format for this op, at least not without making some TableGen modifications to pass custom directives a reference to the OperationState. I'm also not certain it's possible to get rid of case_operand_offsets in favor of operand_segment_sizes, since this op seems unique in storing a variable list of variadic operands. But, aside from those two points, I think I've addressed all of your review comments so far. Thank you, @ftynse and @rriddle! Please take another look at this when you have some time.

See my earlier comment for handling case_values, you just need to provide that to the custom directive in the declarative form to get a parameter for it in the C++ parser/printer functions. For operand_segment_sizes, that is specific to how ODS handles variadics, and given there is no support for variadics of variadics I don't think you'll be able to reuse that right now.

modocache planned changes to this revision.Dec 11 2020, 1:39 PM
modocache marked an inline comment as done.
modocache added inline comments.
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
196

Oh, excellent! I think I have it working. It was PEBKAC, I had previously passed in $case_values as you described, but in my parse* function definition I marked it ArrayAttr, not ArrayAttr&, and so I must have been assigning values to a copy. Thanks!

modocache updated this revision to Diff 311329.Dec 11 2020, 3:19 PM
modocache marked 7 inline comments as done.

Thanks for answering all my questions! I think I've responded to all of your feedback, let me know if I missed anything. I also have a question about parsing and printing integers that I'll post as an inline comment.

modocache added inline comments.Dec 11 2020, 3:24 PM
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
236

Here I'm parsing IntegerAttr, but ideally I'd like to parse integer literals. However, I didn't see a parser function to do so. Aesthetically I don't mind the op including the integer attribute type information (so cases look like [1 : i32 = ^bb1, ... ]), but the uglier part comes when I translate these into an ElementsAttr below...

257

...here I translate between parsed IntegerAttr, their values, and an i32 vector attribute. I don't know if it's possible to parse integer literals right now, but if not, do you know of a better way to convert an array of IntegerAttr into an ElementsAttr?

modocache updated this revision to Diff 311330.Dec 11 2020, 4:05 PM

Rebase past D92845, add instantiated SwitchInst to the branch mapping.

rriddle added inline comments.Dec 11 2020, 5:49 PM
mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
257

Not at the moment, IntegerAttr is the only way to currently get an integer. Sent out D93152 for this.

Also sent out D93151 for newline support.

Thanks for the review, @ftynse!

I noticed I didn't explicitly call this out in my commit message, so I'll do so now: with this patch as-is, the llvm.switch MLIR op has stricter constraints that the LLVM IR switch instruction: the LLVM IR switch can be used with no case values, whereas MLIR llvm.switch currently cannot. The following LLVM IR switch behaves the same way as an unconditional br %default instruction would:

switch i32 %0, label %default [
]

However, with this patch as-is, we reject the equivalent MLIR, an llvm.switch op with no case values and no case destinations:

llvm.switch %0, ^bb1, [], []

It's not difficult to modify this patch to allow for a case-less llvm.switch MLIR op, but I wasn't sure it was a good idea to do so. It seems to me that any reasonable frontend could avoid emitting llvm.switch %0, ^bb1, [], [] and instead emit llvm.br ^bb1. So, for now, case-less llvm.switch MLIR ops are rejected, but if anyone disagrees, please let me know.

If we do decide to go this direction, any variations from the LLVM operation should be documented in the description of the op. I'll defer to @ftynse for their preference here.

I would argue that it's not a good idea to call something "LLVM dialect" and diverge from LLVM IR. Yes, it has a lot of quirks. "Fixing" them in MLIR but not in LLVM IR only makes the situation worse. No one not only has to remember all the quirks, but also how they are different in MLIR. So I have a very strong preference for supporting everything LLVM IR can. We can add more "usability" features like not repeating the type excessively or conditional branching to the same block with different arguments, but we should not decrease expressivity.

mlir/include/mlir/Dialect/LLVMIR/LLVMOps.td
673–677

Okay, thanks for the explanation! I don't think adding infrastructural support because of one op is justified.

mlir/lib/Target/LLVMIR/ModuleTranslation.cpp
750

ODS indeed adds a verifier. And we don't check autogenerated verifiers in all uses, so no need to put this in invalid.mlir.

I suppose I reacted to getting the type unconditionally, I'd rather query the operand and get its type.

modocache planned changes to this revision.Dec 14 2020, 11:39 AM

Sounds good, thanks! I agree, I'll modify this patch later today such that the MLIR llvm.switch op accepts zero cases, just as LLVM IR switch instruction does.

modocache updated this revision to Diff 311701.Dec 14 2020, 2:02 PM
modocache marked 4 inline comments as done.

Thanks for the integer and newline patches @rriddle! I've updated this patch to use those new features. And thanks for another round of reviews @ftynse, I think I've responded to all the feedback given so far, let me know if I've missed anything.

modocache updated this revision to Diff 311709.Dec 14 2020, 2:22 PM

Fix comment typo in test.

ftynse accepted this revision.Dec 15 2020, 1:11 AM

Thanks for quick iteration!

This revision is now accepted and ready to land.Dec 15 2020, 1:11 AM
modocache updated this revision to Diff 311904.Dec 15 2020, 7:39 AM

Two small tweaks: give the BlockRange caseDestinations parameter of the op builder a default empty value, and "indent" cases before printing them.

Not at all, thank you for all the great feedback!

mlir/lib/Dialect/LLVMIR/IR/LLVMDialect.cpp
280

One last thing before I commit this: @rriddle, would it be possible to add an indent member function to the printer? I noticed that without adding spaces here, the op was being printed like this:

llvm.switch %0, ^bb1 [
1: ^bb2 // Oops! Not indented.
]

So I decided to add two spaces manually here, but in your addition of printNewline in D93151, you indent based on indentation widths that are shared throughout the printer's internals. If MLIR began indenting by 4, this switch op case printer would begin to look funny.

Let me know if you'd prefer this manual indentation to be cleaned up before this lands, and if you'd like me to try to write a patch to expose indenting to the printer's public interface.

I think I'll commit this later today, but let me know if you have any thoughts on adding an interface for indenting text to the printer, @rriddle. I'd be happy to try my hand at sending a patch to do so, as well.

This revision was landed with ongoing or failed builds.Dec 17 2020, 11:17 AM
This revision was automatically updated to reflect the committed changes.