Previously broadcast was a binary op. Now it can support more inputs.
This has been changed in such a way that for now, this is an NFC for
all broadcast operations that were previously legal.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/include/mlir/Dialect/Shape/IR/ShapeOps.td | ||
---|---|---|
83–88 | Do we have places that use this form? | |
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.cpp | ||
80 | The last sentence is true for all rewrites, so redundant here. | |
90 | A comment here would be good to explain why dim is used to get rank. Also does dim work on a shape? (Tensor yes, and so are we guaranteed we'd be in tensor world here?) | |
188 | Could more be reused here? E.g., is the nary one (excluding computing the max rank) less efficient for the binary case? Or vice versa, is multiple binary case applications less efficient than nary lowering? |
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.cpp | ||
---|---|---|
115 | inBound is confusing. It is true if we are outside the bounds of the index, right? | |
130 | This sentence ends abruptly. | |
142 | You could fold this up. In the then case, return reduceDim. In the else case, do the select. | |
188 | Thinking of it, maybe doing multiple 2d broadcasts in a row (in the implementation, not the op) would yield similar performance. | |
mlir/lib/Dialect/Shape/IR/Shape.cpp | ||
355 | Is this intended? |
I'm going to wait on fixing the test until I hear your thoughts on the two options for lowering broadcast.
mlir/include/mlir/Dialect/Shape/IR/ShapeOps.td | ||
---|---|---|
83–88 | This was the original form of the build method before making the number of inputs variadic. I thought it might be nice to still have it for the common case and to make the transition essentially an NFC | |
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.cpp | ||
90 | This was guaranteed in the caller. I'll replicate the check here though in case of future uses from other locations. | |
188 | I think performance differences between the two are negligible. We already fully unroll the tensor::GenerateOp, so they will be roughly the same. For the binary case, we need a starting value during reductions in the nary lowering (without making the c++ code much more complex) while the binary case can skip that step. On the other hand, the binary case might recompute a small amount of work between each invocation for more than 2 inputs. Technically the nary case has the potential to be just as performant. I'm just not sure how clean I can make it look. I personally find the n-ary lowering to be easier to read, and I think Stephan has in the past said he expected the binary case was easier to read. I think we should choose the implementation that we find easier to read and stick with that. I would like to hear your opinions on if you agree or not, and which you find easier to read. |
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.cpp | ||
---|---|---|
188 | I found the binary case easier because my n-ary case looked like a mess. Looking at your code, this is much nicer. So let's ship the n-ary case only. The performance difference should be negligible. Could you also extend the cstr_broadcastable accordingly? They need to be in sync otherwise broadcast cannot really be used. |
mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.cpp | ||
---|---|---|
188 | Binary is removed. I'll extend cstr_broadcastable in a follow up CL. |
Do we have places that use this form?