This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Conversion/ShapeToStandard/
-
Conversion/
-
ShapeToStandard/
-
CMakeLists.txt
-
ShapeToStandard.cpp
1/5
ShapeToStandard.td
-
test/Conversion/ShapeToStandard/
-
Conversion/
-
ShapeToStandard/
1/3
shape-to-standard.mlir

Differential D89325

[mlir] Add partial lowering of shape.cstr_broadcastable.
ClosedPublic

Authored by tpopp on Oct 13 2020, 8:57 AM.

Download Raw Diff

Details

Reviewers

herhut
silvas

Commits

rGd05d42199f77: [mlir] Add partial lowering of shape.cstr_broadcastable.

Summary

Because cstr operations allow more instruction reordering than asserts, we only
lower cstr_broadcastable to std ops with cstr_require. This ensures that the
more drastic lowering to asserts can happen specifically with the user's desire.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

tpopp created this revision.Oct 13 2020, 8:57 AM

Herald added a project: Restricted Project. · View Herald TranscriptOct 13 2020, 8:57 AM

Herald added subscribers: rdzhabarov, tatianashp, msifontes and 13 others. · View Herald Transcript

tpopp requested review of this revision.Oct 13 2020, 8:57 AM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald TranscriptOct 13 2020, 8:57 AM

Please let me know what you think of this. I think lowering away the "broadcast" logic and keeping the "cstr" logic in ShapeToStd fits with the purpose of the pass, but that is debatable.

Harbormaster completed remote builds in B74945: Diff 297878.Oct 13 2020, 9:13 AM

This is a very interesting direction! I was thinking of something similar. Let's discuss this at our shape meeting on Thursday.

If we land this, we should remove the similar code from LowerShapeConstraints.

We discussed this and I think agreed to support the lowering via RequireOp. We have to find one place for these patterns, though.

Do we want to refactor the patterns in LowerShapeConstraints?

The result of our discussion was to do

(a) assume(cstr_broadcastable(x, y)) =>
(b) assume(cstr_requires(is_broadcastable(x,y), “cannot broadcast”) =>
(c) assume(cstr_requires(scf.for( … ), “cannot broadcast”))

I think that we can reformulate LowerShapeConstraints to be (a)->(b).

Then we have ShapeToStd implement is_broadcastable->scf.for ((b)->(c)). Not sure if there is too much value in doing (b)->(c) in the absence of other shape-to-std patterns (Thoughts?).

We won't necessarily need to have a pattern that lowers cstr_requires upstream, though I think that having a standalone pass that lowers it to std.assert would be ok to have but not required.

Rework to lower to IsBroadcastable which is further lowered.

Herald added a subscriber: mgorny. · View Herald TranscriptOct 29 2020, 9:40 AM

Reformat each block of code in own namespace.

Harbormaster completed remote builds in B76934: Diff 301659.Oct 29 2020, 9:45 AM

Harbormaster completed remote builds in B76936: Diff 301662.

silvas added inline comments.Oct 29 2020, 3:09 PM

mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
23	I think that this decomposition is useful independently. Can we have "decomposeConstraints" pass / populateDecomposeConstraintsPatterns or something?

silvas requested changes to this revision.Oct 29 2020, 3:12 PM

silvas added inline comments.

mlir/test/Conversion/ShapeToStandard/shape-to-standard.mlir
439	This test is really unfortunate, since it is basically a test of the pattern you are introducing in this patch together with the pattern in https://reviews.llvm.org/D90407 In a separate pass, we would avoid this, since we would just observe the cstr_broadcastable -> cstr_require(is_broadcastable, "msg") expansion. We don't want to test these same 20+ lines of broadcast-to-std more times than we absolutely have to :)

This revision now requires changes to proceed.Oct 29 2020, 3:12 PM

tpopp added inline comments.Oct 30 2020, 3:07 AM

mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
23	What use are you thinking of? My original logic for lowering `cstr_broadcastable` in this pass was that it fits the intention of this pass in removing special Shape computations. Creating a separate pass would then mean that ShapeToStandard is not actually replacing all Shape computations and that users would have to know to first do a separate lowering. This also gets annoying because we are making more code bloat by having to define/call more passes (with a required order) and more populate functions. DecomposeConstraints() ShapeToStandard() ConstraintsToAsserts() Also, having a separate pass for decomposing constraints runs the risk of users decomposing early and losing some of the reasoning benefits in these `cstr_` operations. If this doesn't belong here, to be safe, I would rather only have the lowering in `ConvertShapeConstraints` (although note we'd have to rerun `ShapeToStandard` after that). If you have some concrete example, I could be convinced. I'm just worried that a composability mindset might, in this case, cause both `ShapeToStandard` and `Cstr*Op`s to be left in a state different from their intended purposes.
mlir/test/Conversion/ShapeToStandard/shape-to-standard.mlir
439	See my comment above. I think we need a good reason to split these in separate passes. A separate pass would require more than 20 lines split across ~4 files which would then be more lines than this test case. I do agree with the repeated test being unfortunate. I'm hoping to discuss the possibility of having a configuration so that operations aren't recursively legalized. This would be useful for tests where it's nice to have multiple patterns compose for full lowerings, but we also would like to test individual patterns.

silvas added inline comments.Oct 30 2020, 12:05 PM

mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
23	The assumption of having is_broadcastable in the first place is that it is useful to have an intermediate state that only has cstr_require and not other cstr_* ops. So it seems useful to provide that pass in isolation. As a simple example, CSE should be able to easily deduplicate is_broadcastable, but it cannot easily deduplicate the result of expanding is_broadcastable. Similarly, cstr_broadcastable cannot be easily deduplicated because it potentially aborts the program, which makes the reasoning more complicated for deduping it. So a user with lots of broadcasting in their program will likely want to run CSE after DecomposeConstraints() but before ShapeToStandard(). Also, we described use cases like `cstr_require (is_broadcastable(a, b) \|\| is_broadcastable(b, c), "msg")`. We want to dedupe is_broadcastable calls between this predicate and `cstr_broadcastable(a, b)`, which will happen naturally after decomposing constraints. Because this is just a bunch of patterns, it's easy for a user to call two separate populate* functions in their own pass that combines the two (if they are concerned about compile time performance). However, it's much harder for them to come upstream and separate it into two populate* functions (or better, passes so they don't have to roll their own downstream and setup legality and such) in order to get the CSE benefit I just described.

herhut accepted this revision.Nov 2 2020, 8:34 AM

herhut added inline comments.

mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
23	Similarly, cstr_broadcastable cannot be easily deduplicated because it potentially aborts the program, which makes the reasoning more complicated for deduping it. `cstr_broadcastable` can and should be CSE'd. If we have two identical constraints and one is dominated by the other, then the dominated one will never abort. `cstr_` operations have interesting semantics in that sense (and how we express those semantics is a different story). The hope was to start with simple constraints and stay with them as far as possible to simplify analysis for now. Having a constraint of the form `cstr_require (is_broadcastable(a, b) \|\| is_broadcastable(b, c), "msg")` makes analysis hard and I have a hard time imagining an example where such constraints would arise. If you have one, that would be really helpful. I get the example of CSE where a program has both `is_broadcastable` and `cstr_broadcastable` and partial lowering is helpful. Again, I do not see a good example, though. What is very common is to have `cstr_broadcastable` and `broadcast` and we might want to look into optimizations there, if it becomes a performance issue down the road. My instinct would be to keep it in one pass until we see a use case. That gives us one less knob to control. The decision is easy to change later and less knobs might be a good thing for now.
mlir/test/Conversion/ShapeToStandard/shape-to-standard.mlir
439	I'm hoping to discuss the possibility of having a configuration so that operations aren't recursively legalized. That would indeed be fairly helpful when testing lowerings that have multiple steps. The tests for scf to standard could benefit from this, as well.

silvas added inline comments.Nov 2 2020, 12:35 PM

mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td
23	cstr_broadcastable can and should be CSE'd. If we have two identical constraints and one is dominated by the other, then the dominated one > will never abort. cstr_ operations have interesting semantics in that sense (and how we express those semantics is a different story). Yes, but that reasoning is more difficult. I don't think anyone has immediate plans to define the properties needed and extend CSE to do this. We would need something like "idempotent side effect". And "somebody wants to CSE broadcasts today without proposing a new IR property" seems like a use case that we don't need to wait for IMO. Given that we can trivially lower it to is_broadcastable and get CSE there (which we want anyways and seems to basically "come for free"), I don't see the advantage of wanting to do any reasoning on the cstr_broadcastable ops. As common as they might be, canonicalizing them immediately to a different form seems valuable (do you see any downside?). E.g. "<= const_integer" comparisons are very common, but LLVM turns all `x <= const` into `x < const+1`. Anyway, feel free to submit this. We should circle back to this aspect of our intended design in our next shape meeting.

silvas accepted this revision.Nov 2 2020, 12:35 PM

This revision is now accepted and ready to land.Nov 2 2020, 12:35 PM

This revision was landed with ongoing or failed builds.Nov 3 2020, 12:57 AM

Closed by commit rGd05d42199f77: [mlir] Add partial lowering of shape.cstr_broadcastable. (authored by Tres Popp <tpopp@google.com>). · Explain Why

This revision was automatically updated to reflect the committed changes.

Tres Popp <tpopp@google.com> added a commit: rGd05d42199f77: [mlir] Add partial lowering of shape.cstr_broadcastable..

Revision Contents

Path

Size

mlir/

lib/

Conversion/

ShapeToStandard/

CMakeLists.txt

5 lines

ShapeToStandard.cpp

8 lines

ShapeToStandard.td

27 lines

test/

Conversion/

ShapeToStandard/

shape-to-standard.mlir

39 lines

Diff 302501

mlir/lib/Conversion/ShapeToStandard/CMakeLists.txt

				set(LLVM_TARGET_DEFINITIONS ShapeToStandard.td)
				mlir_tablegen(ShapeToStandard.cpp.inc -gen-rewriters)
				add_public_tablegen_target(ShapeToStandardIncGen)

	add_mlir_conversion_library(MLIRShapeToStandard			add_mlir_conversion_library(MLIRShapeToStandard
	ConvertShapeConstraints.cpp			ConvertShapeConstraints.cpp
	ShapeToStandard.cpp			ShapeToStandard.cpp

	ADDITIONAL_HEADER_DIRS			ADDITIONAL_HEADER_DIRS
	${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/ShapeToStandard			${MLIR_MAIN_INCLUDE_DIR}/mlir/Conversion/ShapeToStandard

	DEPENDS			DEPENDS
	MLIRConversionPassIncGen			MLIRConversionPassIncGen
				ShapeToStandardIncGen

	LINK_COMPONENTS			LINK_COMPONENTS
	Core			Core

	LINK_LIBS PUBLIC			LINK_LIBS PUBLIC
	MLIREDSC			MLIREDSC
	MLIRIR			MLIRIR
	MLIRShape			MLIRShape
	MLIRPass			MLIRPass
	MLIRSCF			MLIRSCF
	MLIRTransforms			MLIRTransforms
	)			)

mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.cpp

Show First 20 Lines • Show All 561 Lines • ▼ Show 20 Lines	matchAndRewrite(ToExtentTensorOp op, ArrayRef<Value> operands,
rewriter.replaceOpWithNewOp<TensorCastOp>(op, adaptor.input(),		rewriter.replaceOpWithNewOp<TensorCastOp>(op, adaptor.input(),
op.getType());		op.getType());
return success();		return success();
}		}
};		};
} // namespace		} // namespace

namespace {		namespace {
		/// Import the Shape Ops to Std Patterns.
		#include "ShapeToStandard.cpp.inc"
		} // namespace

		namespace {
/// Conversion pass.		/// Conversion pass.
class ConvertShapeToStandardPass		class ConvertShapeToStandardPass
: public ConvertShapeToStandardBase<ConvertShapeToStandardPass> {		: public ConvertShapeToStandardBase<ConvertShapeToStandardPass> {

void runOnOperation() override;		void runOnOperation() override;
};		};
} // namespace		} // namespace

void ConvertShapeToStandardPass::runOnOperation() {		void ConvertShapeToStandardPass::runOnOperation() {
// Setup target legality.		// Setup target legality.
MLIRContext &ctx = getContext();		MLIRContext &ctx = getContext();
ConversionTarget target(ctx);		ConversionTarget target(ctx);
target.addLegalDialect<StandardOpsDialect, SCFDialect>();		target.addLegalDialect<StandardOpsDialect, SCFDialect>();
target.addLegalOp<FuncOp, ModuleOp, ModuleTerminatorOp>();		target.addLegalOp<CstrRequireOp, FuncOp, ModuleOp, ModuleTerminatorOp>();

// Setup conversion patterns.		// Setup conversion patterns.
OwningRewritePatternList patterns;		OwningRewritePatternList patterns;
populateShapeToStandardConversionPatterns(patterns, &ctx);		populateShapeToStandardConversionPatterns(patterns, &ctx);

// Apply conversion.		// Apply conversion.
auto module = getOperation();		auto module = getOperation();
if (failed(applyPartialConversion(module, target, std::move(patterns))))		if (failed(applyPartialConversion(module, target, std::move(patterns))))
signalPassFailure();		signalPassFailure();
}		}

void mlir::populateShapeToStandardConversionPatterns(		void mlir::populateShapeToStandardConversionPatterns(
OwningRewritePatternList &patterns, MLIRContext *ctx) {		OwningRewritePatternList &patterns, MLIRContext *ctx) {
// clang-format off		// clang-format off
		populateWithGenerated(ctx, patterns);
patterns.insert<		patterns.insert<
AnyOpConversion,		AnyOpConversion,
BinaryOpConversion<AddOp, AddIOp>,		BinaryOpConversion<AddOp, AddIOp>,
BinaryOpConversion<MulOp, MulIOp>,		BinaryOpConversion<MulOp, MulIOp>,
BroadcastOpConverter,		BroadcastOpConverter,
ConstShapeOpConverter,		ConstShapeOpConverter,
ConstSizeOpConversion,		ConstSizeOpConversion,
IsBroadcastableOpConverter,		IsBroadcastableOpConverter,
Show All 13 Lines

mlir/lib/Conversion/ShapeToStandard/ShapeToStandard.td

This file was added.

				//==-- ShapeToStandard.td - Shape to Standard Patterns -------- tablegen -==//
				//
				// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
				// See https://llvm.org/LICENSE.txt for license information.
				// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
				//
				//===----------------------------------------------------------------------===//
				//
				// Defines Patterns to lower Shape ops to Std.
				//
				//===----------------------------------------------------------------------===//

				#ifndef MLIR_CONVERSION_SHAPETOSTANDARD_TD
				#define MLIR_CONVERSION_SHAPETOSTANDARD_TD

				include "mlir/Dialect/Shape/IR/ShapeOps.td"

				def BroadcastableStringAttr : NativeCodeCall<[{
				$_builder.getStringAttr("required broadcastable shapes")
				}]>;

				def : Pat<(Shape_CstrBroadcastableOp $LHS, $RHS),
				(Shape_CstrRequireOp
				silvasUnsubmitted Not Done Reply Inline Actions I think that this decomposition is useful independently. Can we have "decomposeConstraints" pass / populateDecomposeConstraintsPatterns or something? silvas: I think that this decomposition is useful independently. Can we have "decomposeConstraints"…
				tpoppAuthorUnsubmitted Done Reply Inline Actions What use are you thinking of? My original logic for lowering `cstr_broadcastable` in this pass was that it fits the intention of this pass in removing special Shape computations. Creating a separate pass would then mean that ShapeToStandard is not actually replacing all Shape computations and that users would have to know to first do a separate lowering. This also gets annoying because we are making more code bloat by having to define/call more passes (with a required order) and more populate functions. DecomposeConstraints() ShapeToStandard() ConstraintsToAsserts() Also, having a separate pass for decomposing constraints runs the risk of users decomposing early and losing some of the reasoning benefits in these `cstr_` operations. If this doesn't belong here, to be safe, I would rather only have the lowering in `ConvertShapeConstraints` (although note we'd have to rerun `ShapeToStandard` after that). If you have some concrete example, I could be convinced. I'm just worried that a composability mindset might, in this case, cause both `ShapeToStandard` and `CstrOp`s to be left in a state different from their intended purposes. tpopp:* What use are you thinking of? My original logic for lowering `cstr_broadcastable` in this pass…
				silvasUnsubmitted Not Done Reply Inline Actions The assumption of having is_broadcastable in the first place is that it is useful to have an intermediate state that only has cstr_require and not other cstr_* ops. So it seems useful to provide that pass in isolation. As a simple example, CSE should be able to easily deduplicate is_broadcastable, but it cannot easily deduplicate the result of expanding is_broadcastable. Similarly, cstr_broadcastable cannot be easily deduplicated because it potentially aborts the program, which makes the reasoning more complicated for deduping it. So a user with lots of broadcasting in their program will likely want to run CSE after DecomposeConstraints() but before ShapeToStandard(). Also, we described use cases like `cstr_require (is_broadcastable(a, b) \|\| is_broadcastable(b, c), "msg")`. We want to dedupe is_broadcastable calls between this predicate and `cstr_broadcastable(a, b)`, which will happen naturally after decomposing constraints. Because this is just a bunch of patterns, it's easy for a user to call two separate populate* functions in their own pass that combines the two (if they are concerned about compile time performance). However, it's much harder for them to come upstream and separate it into two populate* functions (or better, passes so they don't have to roll their own downstream and setup legality and such) in order to get the CSE benefit I just described. silvas: The assumption of having is_broadcastable in the first place is that it is useful to have an…
				herhutUnsubmitted Not Done Reply Inline Actions Similarly, cstr_broadcastable cannot be easily deduplicated because it potentially aborts the program, which makes the reasoning more complicated for deduping it. `cstr_broadcastable` can and should be CSE'd. If we have two identical constraints and one is dominated by the other, then the dominated one will never abort. `cstr_` operations have interesting semantics in that sense (and how we express those semantics is a different story). The hope was to start with simple constraints and stay with them as far as possible to simplify analysis for now. Having a constraint of the form `cstr_require (is_broadcastable(a, b) \|\| is_broadcastable(b, c), "msg")` makes analysis hard and I have a hard time imagining an example where such constraints would arise. If you have one, that would be really helpful. I get the example of CSE where a program has both `is_broadcastable` and `cstr_broadcastable` and partial lowering is helpful. Again, I do not see a good example, though. What is very common is to have `cstr_broadcastable` and `broadcast` and we might want to look into optimizations there, if it becomes a performance issue down the road. My instinct would be to keep it in one pass until we see a use case. That gives us one less knob to control. The decision is easy to change later and less knobs might be a good thing for now. herhut: > Similarly, cstr_broadcastable cannot be easily deduplicated because it potentially aborts the…
				silvasUnsubmitted Not Done Reply Inline Actions cstr_broadcastable can and should be CSE'd. If we have two identical constraints and one is dominated by the other, then the dominated one > will never abort. cstr_ operations have interesting semantics in that sense (and how we express those semantics is a different story). Yes, but that reasoning is more difficult. I don't think anyone has immediate plans to define the properties needed and extend CSE to do this. We would need something like "idempotent side effect". And "somebody wants to CSE broadcasts today without proposing a new IR property" seems like a use case that we don't need to wait for IMO. Given that we can trivially lower it to is_broadcastable and get CSE there (which we want anyways and seems to basically "come for free"), I don't see the advantage of wanting to do any reasoning on the cstr_broadcastable ops. As common as they might be, canonicalizing them immediately to a different form seems valuable (do you see any downside?). E.g. "<= const_integer" comparisons are very common, but LLVM turns all `x <= const` into `x < const+1`. Anyway, feel free to submit this. We should circle back to this aspect of our intended design in our next shape meeting. silvas: > cstr_broadcastable can and should be CSE'd. If we have two identical constraints and one is…
				(Shape_IsBroadcastableOp $LHS, $RHS),
				(BroadcastableStringAttr))>;

				#endif // MLIR_CONVERSION_SHAPETOSTANDARD_TD

mlir/test/Conversion/ShapeToStandard/shape-to-standard.mlir

	Show First 20 Lines • Show All 414 Lines • ▼ Show 20 Lines
	// CHECK: %[[EXTENTS_ARE_EQUAL:.*]] = cmpi "eq", %[[LARGER_EXTENT]], %[[SMALLER_EXTENT]] : index			// CHECK: %[[EXTENTS_ARE_EQUAL:.*]] = cmpi "eq", %[[LARGER_EXTENT]], %[[SMALLER_EXTENT]] : index
	// CHECK: %[[EITHER_EXTENT_IS_ONE:.*]] = or %[[LARGER_EXTENT_IS_ONE]], %[[SMALLER_EXTENT_IS_ONE]] : i1			// CHECK: %[[EITHER_EXTENT_IS_ONE:.*]] = or %[[LARGER_EXTENT_IS_ONE]], %[[SMALLER_EXTENT_IS_ONE]] : i1
	// CHECK: %[[OR_EXTENTS_ARE_EQUAL:.*]] = or %[[EITHER_EXTENT_IS_ONE]], %[[EXTENTS_ARE_EQUAL]] : i1			// CHECK: %[[OR_EXTENTS_ARE_EQUAL:.*]] = or %[[EITHER_EXTENT_IS_ONE]], %[[EXTENTS_ARE_EQUAL]] : i1
	// CHECK: %[[NEW_ALL_SO_FAR:.*]] = and %[[ALL_SO_FAR]], %[[OR_EXTENTS_ARE_EQUAL]] : i1			// CHECK: %[[NEW_ALL_SO_FAR:.*]] = and %[[ALL_SO_FAR]], %[[OR_EXTENTS_ARE_EQUAL]] : i1
	// CHECK: scf.yield %[[NEW_ALL_SO_FAR]] : i1			// CHECK: scf.yield %[[NEW_ALL_SO_FAR]] : i1
	// CHECK: }			// CHECK: }
	// CHECK: return %[[ALL_RESULT]] : i1			// CHECK: return %[[ALL_RESULT]] : i1
	// CHECK: }			// CHECK: }

				// -----

				func @broadcast(%a : tensor<?xindex>, %b : tensor<?xindex>) -> !shape.witness {
				%0 = shape.cstr_broadcastable %a, %b : tensor<?xindex>, tensor<?xindex>
				return %0 : !shape.witness
				}

				// CHECK-LABEL: func @broadcast(
				// CHECK-SAME: %[[LHS:.*]]: tensor<?xindex>,
				// CHECK-SAME: %[[RHS:.*]]: tensor<?xindex>) -> !shape.witness {
				// CHECK: %[[C0:.*]] = constant 0 : index
				// CHECK: %[[C1:.*]] = constant 1 : index
				// CHECK: %[[LHS_RANK:.*]] = dim %[[LHS]], %[[C0]] : tensor<?xindex>
				// CHECK: %[[RHS_RANK:.*]] = dim %[[RHS]], %[[C0]] : tensor<?xindex>
				// CHECK: %[[LHS_SMALLER:.*]] = cmpi "ule", %[[LHS_RANK]], %[[RHS_RANK]] : index
				// CHECK: %[[SMALLER_RANK:.*]] = select %[[LHS_SMALLER]], %[[LHS_RANK]], %[[RHS_RANK]] : index
				silvasUnsubmitted Not Done Reply Inline Actions This test is really unfortunate, since it is basically a test of the pattern you are introducing in this patch together with the pattern in https://reviews.llvm.org/D90407 In a separate pass, we would avoid this, since we would just observe the cstr_broadcastable -> cstr_require(is_broadcastable, "msg") expansion. We don't want to test these same 20+ lines of broadcast-to-std more times than we absolutely have to :) silvas: This test is really unfortunate, since it is basically a test of the pattern you are…
				tpoppAuthorUnsubmitted Done Reply Inline Actions See my comment above. I think we need a good reason to split these in separate passes. A separate pass would require more than 20 lines split across ~4 files which would then be more lines than this test case. I do agree with the repeated test being unfortunate. I'm hoping to discuss the possibility of having a configuration so that operations aren't recursively legalized. This would be useful for tests where it's nice to have multiple patterns compose for full lowerings, but we also would like to test individual patterns. tpopp: See my comment above. I think we need a good reason to split these in separate passes. A…
				herhutUnsubmitted Not Done Reply Inline Actions I'm hoping to discuss the possibility of having a configuration so that operations aren't recursively legalized. That would indeed be fairly helpful when testing lowerings that have multiple steps. The tests for scf to standard could benefit from this, as well. herhut: > I'm hoping to discuss the possibility of having a configuration so that operations aren't…
				// CHECK: %[[LARGER_RANK:.*]] = select %[[LHS_SMALLER]], %[[RHS_RANK]], %[[LHS_RANK]] : index
				// CHECK: %[[RANK_ERASED_LHS:.*]] = tensor_cast %[[LHS]] : tensor<?xindex> to tensor<?xindex>
				// CHECK: %[[RANK_ERASED_RHS:.*]] = tensor_cast %[[RHS]] : tensor<?xindex> to tensor<?xindex>
				// CHECK: %[[SMALLER_SHAPE:.*]] = select %[[LHS_SMALLER]], %[[RANK_ERASED_LHS]], %[[RANK_ERASED_RHS]] : tensor<?xindex>
				// CHECK: %[[LARGER_SHAPE:.*]] = select %[[LHS_SMALLER]], %[[RANK_ERASED_RHS]], %[[RANK_ERASED_LHS]] : tensor<?xindex>
				// CHECK: %[[RANK_DIFF:.*]] = subi %[[LARGER_RANK]], %[[SMALLER_RANK]] : index
				// CHECK: %[[TRUE:.*]] = constant true
				// CHECK: %[[ALL_RESULT:.]] = scf.for %[[VAL_16:.]] = %[[RANK_DIFF]] to %[[LARGER_RANK]] step %[[C1]] iter_args(%[[ALL_SO_FAR:.*]] = %[[TRUE]]) -> (i1) {
				// CHECK: %[[LARGER_EXTENT:.*]] = extract_element %[[LARGER_SHAPE]]{{\[}}%[[VAL_16]]] : tensor<?xindex>
				// CHECK: %[[LARGER_EXTENT_IS_ONE:.*]] = cmpi "eq", %[[LARGER_EXTENT]], %[[C1]] : index
				// CHECK: %[[LHS_EXTENT_INDEX:.*]] = subi %[[VAL_16]], %[[RANK_DIFF]] : index
				// CHECK: %[[SMALLER_EXTENT:.*]] = extract_element %[[SMALLER_SHAPE]]{{\[}}%[[LHS_EXTENT_INDEX]]] : tensor<?xindex>
				// CHECK: %[[SMALLER_EXTENT_IS_ONE:.*]] = cmpi "eq", %[[SMALLER_EXTENT]], %[[C1]] : index
				// CHECK: %[[EXTENTS_ARE_EQUAL:.*]] = cmpi "eq", %[[LARGER_EXTENT]], %[[SMALLER_EXTENT]] : index
				// CHECK: %[[EITHER_EXTENT_IS_ONE:.*]] = or %[[LARGER_EXTENT_IS_ONE]], %[[SMALLER_EXTENT_IS_ONE]] : i1
				// CHECK: %[[OR_EXTENTS_ARE_EQUAL:.*]] = or %[[EITHER_EXTENT_IS_ONE]], %[[EXTENTS_ARE_EQUAL]] : i1
				// CHECK: %[[NEW_ALL_SO_FAR:.*]] = and %[[ALL_SO_FAR]], %[[OR_EXTENTS_ARE_EQUAL]] : i1
				// CHECK: scf.yield %[[NEW_ALL_SO_FAR]] : i1
				// CHECK: }
				// CHECK: %[[RESULT:.*]] = shape.cstr_require %[[ALL_RESULT]], "required broadcastable shapes"
				// CHECK: return %[[RESULT]] : !shape.witness
				// CHECK: }