This is an archive of the discontinued LLVM Phabricator instance.

TOSA-to-Linalg lowering for element-wise ops

Authored by rafaelubalmw on Jun 19 2023, 8:47 AM.


  • Wrote complete documentation for the Broadcastable op trait. This is mostly meant as a thorough description of its previous behavior, with the exception of minor feature updates.
  • Restricted legality criteria for a Broadcastable op in order to simplify current and future lowering passes and increase efficiency of code generated by those passes. New restriction are: 1) A dynamic dimension in an inferred result is not compatible with a static dimension in the actual result. 2) Broadcast semantics are restricted to input operands and not supported between inferred and actual result shapes.
  • Implemented TOSA-to-Linalg lowering support for unary, binary, tertiary element-wise ops. This support is complete for all legal cases described in the Broadcastable trait documentation.
  • Added unit tests for tosa.abs, tosa.add, and as examples of unary, binary, and tertiary ops.

Diff Detail

Event Timeline

rafaelubalmw created this revision.Jun 19 2023, 8:47 AM

Added documentation for the 'Broadcastable' trait.

  • Documentation
rafaelubalmw edited the summary of this revision. (Show Details)Jun 19 2023, 5:24 PM
rafaelubalmw added reviewers: sabauma, eric-k256.
This comment was removed by rafaelubalmw.
rafaelubalmw published this revision for review.Jun 19 2023, 5:26 PM

Created TableGen class 'Tosa_ElementwiseOp'.

Merged 'Tosa_ElemWiseUnaryOp' and 'Tosa_ElemWiseBinaryOp' into
'Tosa_ElementwiseOp'. Revisited all element-wise op definitions to
use this class when possible. Restrictions on element data types are
left out of 'Tosa_ElementiwseOp'.

This is complementary to something we have been working on.

Please see this RFC:
which lays out our proposal for (1) adding verifier for tosa broadcastable operators having equal ranks, and (2) use of EqualizeRanks helper
function to make operands have equal ranks when legalizing to tosa.

the EqualizeRanks function is already merged into llvm and available for use.

I see two differences: (1) EqualizeRanks strictly use the ranks of operands (and not that of the result) to determine what the equalized rank should be.
(2) it uses tosa::ReshapeOp to extend the ranks instead of tensor::ExtendShapeOp

also, see PR where we apply EqualizeRanks in tf/tfl legalization to tosa.

once the PR 60753 lands in tensorflow, we would then add the allranksmatchifknown trait to the tosa dialect.

hopefully we can converge to some common solution.

sabauma added inline comments.Jun 22 2023, 9:09 AM

This separation of behavior between static and dynamic dimensions is a little odd. In the current formulation of TOSA, tensor's static types and ranks are useful analysis results that allow for the generation of better code. Ideally, the behavior of an operator would not change by erasing rank/dimension information (it would just perform worse).

This specification changes that, by eliminating implicit broadcasting only for dynamic dimensions. For instance, if I erase the dimensions of the network below:

func.func @example_static(%arg0: tensor<2xf32>, %arg1: tensor<1xf32>) -> tensor<2xf32> {
  %0 = "tosa.add"(%arg0, %arg1) : (tensor<2xf32>, tensor<1xf32>) -> tensor<2xf32>
  return %0 : tensor<2xf32>

// to

func.func @example_dynamic(%arg0: tensor<?xf32>, %arg1: tensor<?xf32>) -> tensor<?xf32> {
  %0 = "tosa.add"(%arg0, %arg1) : (tensor<?xf32>, tensor<?xf32>) -> tensor<?xf32>
  return %0 : tensor<?xf32>

then you would reasonably expect that example_dynamic can operate on is a strict superset of the runtime values that example_static can operate on, but that is not the case (in fact, it cannot consume any of the runtime values that were legal inputs to example_static).

Another possible design would be to eliminate _implicit_ broadcasting entirely (maybe have an explicit broadcast operator).

Support for dynamic dimension broadcasting in TOSA-to-Linalg lowering of element-wise ops

Added comprehensive unit tests for TOSA element-wise lowering

Hi all,

This is an updated implementation for TOSA-to-Linalg lowering of unary/binary/ternary element-wise ops, according to the latest changes proposed in this RFC:

@mehdi_amini @eric-k256 @sabauma @mamrami @rsuderman @nicolasvasilache

Hi everyone. Just wanted to leave a quick note here as a reminder. Since these changes are quite widespread, it'll be increasingly likely for them to conflict with other work as time passes. Thanks everyone for your time and efforts.

Hi, this is a great useful feature. I just have a few question on it.


Sorry, could you explain it bit more? What does it mean by 1) Dynamic dimensions are never broadcast even if their runtime size is one. ? the dynamic dimension of tensor (marked as ?) does perform broadcasting in certain cases.


If I read this correctly, this trait can be shared with dialects those ops are broadcastable.
Do you think is this test case similar to the case below that is mentioned in

// Dynamic result type - doesn't make sense as is.
// How does the operation know what result to produce? 1x1xf32? 1x2xf32? etc.
// Resolving this would need an additional argument - see below.
%1 = stablehlo.broadcast_in_dim %arg0, dims = [0] : (tensor<1xf32>) -> tensor<1x?xf32>

The description change of this trait is great. I just wonder if it is possible to conflict with other dialects and so that increasing difficulty to use that trait?


I guess to make more systematic use of notifyMatchFailure would be better. Similar to the spots in this patch.


As mentioned by Tai Ly, would the merged EqualizeRanks can help here?

rafaelubalmw marked 5 inline comments as done.Jul 18 2023, 10:22 AM
rafaelubalmw added inline comments.

The comment you are referring to was a mistake in the summary of this Phabricator review, which I have now corrected. An initial version of this pull request was proposing to remove broadcast semantics from dynamic dimensions for simplicity of the lowering pass. However, several sources pointed out the importance of this feature, which was then added to the pull request.


The example in the external RFC you are pointing to seems to focus on the concept of "load bearing" dimensions. As I understand it, these are static dimensions in result types that provide crucial information to interpret the operation's behavior, and therefore, such dimensions should never be dynamic.

I'm inclined to avoid the introduction of load bearing dimensions when possible, as they obscure operation semantics with counterintuitive implicit behavior. This is precisely the reason why this pull request is proposing to forbid implicit broadcast semantics for the result. Once you know that such broadcast never occurs, you may still allow an inferred result to be of type tensor<1xf32> while the actual result is of type tensor<?xf32>. Here, you know the result has one element, but the programmer has decided to use a dynamic dimension in its type.

This flexibility may come in handy during a shape inference pass. In a given intermediate state, input shapes may have been resolved as static, but the output shape has not yet been updated until the current operation is processed. Such state does not violate operation invariants.

The Broadcastable trait is definitely usable in other dialects. As a matter of fact, it is currently used by the tf and tfl dialects in Tensorflow, in spite of its vague definition. The current definition has been designed keeping those particular dialects in mind. Regarding the sablehlo op you are pointing to, I'm afraid its syntax differs enough from element-wise TOSA/Tensorflow ops that it no longer serves as a good candidate for the adoption of the Broadcastable trait in the first place.


This is an intentional use of assert as opposed to notifyMatchFailure(). While the latter indicates the failure of a pass to process a plausible input, the former hints at a failure to guarantee logical invariants. Function expandRank() can never be invoked with a value for argument rank less than the current rank of argument tensor.


These are some key differences between the merged (1) ConversionUtils.h:EqualizeRanks() and (2) TosaToLinalg.cpp:equalizeRank().

  • (1) focuses on equalizing the ranks of two input values to the highest rank, while (2) matches all input ranks to the op result. This is intended to work for ops with any number of input arguments. To avoid confusion with naming, I have renamed (2) as expandInputRanks().

(1) is aimed for a TOSA-to-TOSA conversion pass, where rank expansion is carried out by a tosa.reshape op. While we could use recursive pattern application with (1), (2) gets us out of the TOSA dialect directly by emitting a tensor.expand_shape op instead. This op also takes additional dimension reassociation information, which is necessary when dealing with dynamic dimensions.

rafaelubalmw edited the summary of this revision. (Show Details)Jul 18 2023, 10:31 AM
rafaelubalmw edited the summary of this revision. (Show Details)
rafaelubalmw marked 4 inline comments as done.Jul 18 2023, 10:40 AM

Renamed equalizeRanks() to expandInputRanks() to avoid confusion with the new mlir::tosa::ExpandRank() function.

Thanks for the explanation. Your statement is clear, and I don't have further question.

eric-k256 accepted this revision.Jul 20 2023, 9:47 AM

Thanks for the updates. This looks good to me.

This revision is now accepted and ready to land.Jul 20 2023, 9:47 AM

Thank you, @eric-k256 . Mind landing it for me?

jpienaar added inline comments.

Nit: I think we primarily use ` for vars rather than * *



rafaelubalmw marked 2 inline comments as done.Jul 20 2023, 12:59 PM
rafaelubalmw added inline comments.

See section "Modification 2: Forbidding implicit dynamic-to-static dimension cast in result dimensions" in the RFC:

rafaelubalmw marked an inline comment as done.Jul 20 2023, 1:00 PM

Using backtick for variable names in

eric-k256 accepted this revision.Jul 21 2023, 8:36 AM

Looks okay to me. I'll land it unless there are any remaining issues.

This revision was automatically updated to reflect the committed changes.
jpienaar added inline comments.Jul 24 2023, 9:51 AM

Is this verified or checked before here?


Lets convert this to range based (I don't think index is used outside that). Also


nit: unranked to be consistent with rest.


I know I've run into this TF side (and it sounds like current integrate rotation is running into it), I mostly err on not failing to verify unless wrong. TF in particular I know one can explicitly set shapes in GraphDef (in some cases) and so you end up with weirdness post import of static shape in to identity op and then dynamic shape out or vice versa and then needing to have shape inference pass fix it up, so it is invalid post import until a cleanup.

eric-k256 added inline comments.Jul 24 2023, 4:05 PM

I wasn't aware of this, that's good to know.

I don't suppose there is a test suite that I can run to verify the TF side that would have caught this? I run the check-mlir tests within the repo right now when about to merge but no TF side tests.

rafaelubalmw added inline comments.Jul 26 2023, 7:20 AM

Does this mean we need to update the behavior for this case? If this restriction becomes problematic, a reasonable alternative would be allowing for implicit dynamic-to-static cast with undefined behavior if the resulting runtime dimension size does not match the given static size. We would continue to forbid implicit result broadcasting. Let me know if I should go ahead with this change.

Also, given that this patch already landed, should I be creating a new Phabricator review to address the latest comments?

jpienaar added inline comments.Jul 26 2023, 8:06 AM

I mean we could have patched into TF repo and ran ... but that's an unreasonable bar for external contributor to test all downstream projects. In most of these cases it could just be bad tests too (but there is a batch of them).

Yes to new patch. I like the alternative here, I think this is inline with and would mean that materializing runtime asserts that abort would be an allowed lowering.