#map0 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>

func.func @generic_interchanged_transpose(%arg0: tensor<128x12x32xf32>, %arg3: index) -> tensor<128x12x32xf32> {
  %0 = tensor.empty() : tensor<128x12x32xf32>
  %1 = linalg.generic {indexing_maps = [#map0, #map0],
                       iterator_types = ["parallel", "parallel", "parallel"]}
    ins(%arg0 : tensor<128x12x32xf32>)
    outs(%0 : tensor<128x12x32xf32>) {
  ^bb0(%arg1: f32, %arg2: f32):

    %2 = linalg.index 2 : index
    %12 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%2, %arg3)
    %3 = arith.index_cast %12 : index to i32
    %4 = arith.uitofp %3 : i32 to f32
    %5 = arith.mulf %4, %arg1 : f32
    linalg.yield %5 : f32
  } -> tensor<128x12x32xf32>
  return %1 : tensor<128x12x32xf32>
}

transform.sequence failures(propagate) {
 ^bb1(%arg1: !pdl.operation):
   %0 = transform.structured.match ops{["linalg.generic"]} in %arg1
   %1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
   %2 = transform.structured.vectorize %1 { vectorize_nd_extract }
 }

So far I am doing it wrong:

mlir-opt  -test-transform-dialect-interpreter -split-input-file input.mlir
within split at input.mlir:1 offset :11:10: error: 'linalg.index' op operation destroyed but still has uses
    %2 = linalg.index 2 : index
         ^
within split at input.mlir:1 offset :11:10: note: see current operation: %0 = "linalg.index"() {dim = 2 : i64} : () -> index

@dcaballe I've just noticed that you've recently added this comment - is that what's happening here?

awarzynski planned changes to this revision.Jan 23 2023, 8:36 AM

Harbormaster completed remote builds in B209395: Diff 491394.Jan 23 2023, 10:24 AM

Fix the crash, add a test

Herald added a subscriber: • pcwang-thead. · View Herald TranscriptJan 24 2023, 7:36 AM

Harbormaster completed remote builds in B209667: Diff 491792.Jan 24 2023, 10:04 AM

Thanks for adding support for this! It looks great! A few comments

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1089	nit: `ii` typo
1095	When we have to change the insertion point, we use `OpBuilder::Guard` using RAII. I'd suggest that you move this to a utility function and then you can do RAII using the scope of the function. You can look at other examples in MLIR. Just search for `OpBuilder::Guard`. This IR change could be part of a larger set of "linalgOp pre-processing" transformations that happens right before vectorization starts but after we know we can vectorize the op.
1101–1107	Couldn't we just do `rewriter.replaceOp(op, expanded)` and avoid the manual U-D chain update?
mlir/test/Dialect/Linalg/vectorization.mlir
2039	This looks like a new feature to me more than a regression. I think we should match the decomposed ops and make sure they are vectorized accordingly.

Move the new code into a dedicated hook, convertAffineApply, and extend the test.

Herald added a subscriber: thopre. · View Herald TranscriptJan 26 2023, 9:09 AM

awarzynski added inline comments.Jan 26 2023, 9:09 AM

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1095	This IR change could be part of a larger set of "linalgOp pre-processing" transformations that happens right before vectorization starts but after we know we can vectorize the op. I like this idea :) Just to double-check - that set is yet to be created, right?
1101–1107	Perhaps I'm being daft, but things go horrible wrong when I do that. And I assume that that's because `rewriter.replaceOp` invalidates the iterators in the surrounding for loop.

Harbormaster completed remote builds in B210152: Diff 492478.Jan 26 2023, 9:48 AM

Simplify the hook

Awesome!

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp
1052	nit: `//` -> `///` and `.` at the end per coding guidelines.

This revision is now accepted and ready to land.Jan 26 2023, 10:29 AM

Harbormaster completed remote builds in B210176: Diff 492506.Jan 26 2023, 11:27 AM

This revision was landed with ongoing or failed builds.Jan 27 2023, 12:31 AM

Closed by commit rGc7b1176e9afb: [mlir][linalg] Make Linalg vectorizer lower affine.apply (authored by awarzynski). · Explain Why

This revision was automatically updated to reflect the committed changes.

awarzynski added a commit: rGc7b1176e9afb: [mlir][linalg] Make Linalg vectorizer lower affine.apply.

We are seeing some correctness issues when integrating this commit. We attempted to fix it here https://reviews.llvm.org/D143243 but it looks like something else is still not working.
For the following affine.apply:

func.func @_iota_dim0_dispatch_0_generic_2x3() {                                                                                                                                                 
  %c1 = arith.constant 1 : index                                                                                                                                                                 
  %c0 = arith.constant 0 : index                                                                                                                                                                 
  %c3 = arith.constant 3 : index                                                                                                                                                                 
  %c2 = arith.constant 2 : index                                                                                                                                                                 
  %c64 = arith.constant 64 : index                                                                                                                                                               
  %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c64) : !flow.dispatch.tensor<writeonly:tensor<2x3xf32>>                                        
  %workgroup_id_x = hal.interface.workgroup.id[0] : index                                                                                                                                        
  %workgroup_count_x = hal.interface.workgroup.count[0] : index                                                                                                                                  
  %workgroup_id_y = hal.interface.workgroup.id[1] : index                                                                                                                                        
  %workgroup_count_y = hal.interface.workgroup.count[1] : index                                                                                                                                  
  %1 = affine.apply affine_map<()[s0] -> (s0 * 2)>()[%workgroup_id_y]                                                                                                                            
  %2 = affine.apply affine_map<()[s0] -> (s0 * 2)>()[%workgroup_count_y]                                                                                                                         
  %3 = affine.apply affine_map<()[s0] -> (s0 * 3)>()[%workgroup_id_x]                                                                                                                            
  %4 = affine.apply affine_map<()[s0] -> (s0 * 3)>()[%workgroup_count_x]                                                                                                                         
  scf.for %arg0 = %1 to %c2 step %2 {                                                                                                                                                            
    scf.for %arg1 = %3 to %c3 step %4 {                                                                                                                                                          
      %5 = flow.dispatch.tensor.load %0, offsets = [%arg0, %arg1], sizes = [2, 3], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<2x3xf32>> -> tensor<2x3xf32>                        
      %6 = scf.for %arg2 = %c0 to %c2 step %c1 iter_args(%arg3 = %5) -> (tensor<2x3xf32>) {                                                                                                      
        %extracted_slice = tensor.extract_slice %arg3[%arg2, 0] [1, 3] [1, 1] : tensor<2x3xf32> to tensor<1x3xf32>                                                                               
        %7 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} outs(%extracted_slice : tensor<1x3xf32>) attrs =  {__internal_linalg_transform__ = "1", lowering_config = #iree_codegen.lowering_config<tile_sizes = [[2, 3], [1, 16], [0, 0]]>} {                                                                                    
        ^bb0(%out: f32):                                                                                                                                                                         
          %8 = linalg.index 0 : index                                                                                                                                                            
          %9 = affine.apply affine_map<(d0, d1, d2) -> (d0 + d1 + d2)>(%arg0, %8, %arg2)                                                                                                         
          %10 = arith.index_cast %9 : index to i32                                                                                                                                               
          %11 = arith.sitofp %10 : i32 to f32                                                                                                                                                    
          linalg.yield %11 : f32                                                                                                                                                                 
        } -> tensor<1x3xf32>                                                                                                                                                                     
        %inserted_slice = tensor.insert_slice %7 into %arg3[%arg2, 0] [1, 3] [1, 1] : tensor<1x3xf32> into tensor<2x3xf32>                                                                       
        scf.yield %inserted_slice : tensor<2x3xf32>                                                                                                                                              
      }                                                                                                                                                                                          
      flow.dispatch.tensor.store %6, %0, offsets = [%arg0, %arg1], sizes = [2, 3], strides = [1, 1] : tensor<2x3xf32> -> !flow.dispatch.tensor<writeonly:tensor<2x3xf32>>                        
    }                                                                                                                                                                                            
  }                                                                                                                                                                                              
  return                                                                                                                                                                                         
}

it looks like some dimensions are missing after the expansion:

func.func @_iota_dim1_dispatch_0_generic_2x3() {                                                                                                                                                 
  %cst = arith.constant dense<[0, 1, 2]> : vector<3xindex>                                                                                                                                       
  %c1 = arith.constant 1 : index                                                                                                                                                                 
  %c0 = arith.constant 0 : index                                                                                                                                                                 
  %c3 = arith.constant 3 : index                                                                                                                                                                 
  %c2 = arith.constant 2 : index                                                                                                                                                                 
  %c64 = arith.constant 64 : index                                                                                                                                                               
  %0 = hal.interface.binding.subspan set(0) binding(0) type(storage_buffer) alignment(64) offset(%c64) : !flow.dispatch.tensor<writeonly:tensor<2x3xf32>>                                        
  %workgroup_id_x = hal.interface.workgroup.id[0] : index                                                                                                                                        
  %workgroup_count_x = hal.interface.workgroup.count[0] : index                                                                                                                                  
  %workgroup_id_y = hal.interface.workgroup.id[1] : index                                                                                                                                        
  %workgroup_count_y = hal.interface.workgroup.count[1] : index                                                                                                                                  
  %1 = affine.apply affine_map<()[s0] -> (s0 * 2)>()[%workgroup_id_y]                                                                                                                            
  %2 = affine.apply affine_map<()[s0] -> (s0 * 2)>()[%workgroup_count_y]                                                                                                                         
  %3 = affine.apply affine_map<()[s0] -> (s0 * 3)>()[%workgroup_id_x]                                                                                                                            
  %4 = affine.apply affine_map<()[s0] -> (s0 * 3)>()[%workgroup_count_x]                                                                                                                         
  scf.for %arg0 = %1 to %c2 step %2 {                                                                                                                                                            
    scf.for %arg1 = %3 to %c3 step %4 {                                                                                                                                                          
      %5 = flow.dispatch.tensor.load %0, offsets = [%arg0, %arg1], sizes = [2, 3], strides = [1, 1] : !flow.dispatch.tensor<writeonly:tensor<2x3xf32>> -> tensor<2x3xf32>                        
      %6 = scf.for %arg2 = %c0 to %c2 step %c1 iter_args(%arg3 = %5) -> (tensor<2x3xf32>) {                                                                                                      
        %7 = vector.broadcast %arg1 : index to vector<3xindex>                                                                                                                                   
        %8 = arith.addi %7, %cst : vector<3xindex>                                                                                                                                               
        %9 = arith.index_cast %8 : vector<3xindex> to vector<3xi32>                                                                                                                              
        %10 = arith.sitofp %9 : vector<3xi32> to vector<3xf32>                                                                                                                                   
        %11 = vector.broadcast %10 : vector<3xf32> to vector<1x3xf32>                                                                                                                            
        %12 = vector.transfer_write %11, %arg3[%arg2, %c0] {in_bounds = [true, true]} : vector<1x3xf32>, tensor<2x3xf32>                                                                         
        scf.yield %12 : tensor<2x3xf32>                                                                                                                                                          
      }                                                                                                                                                                                          
      flow.dispatch.tensor.store %6, %0, offsets = [%arg0, %arg1], sizes = [2, 3], strides = [1, 1] : tensor<2x3xf32> -> !flow.dispatch.tensor<writeonly:tensor<2x3xf32>>                        
    }                                                                                                                                                                                            
  }                                                                                                                                                                                              
  return                                                                                                                                                                                         
}

I didn't have time to look into the details but I'm going to revert this for now so that we have a healthy ToT and nobody hits the same problem.
Sorry for the inconvenience.

dcaballe added a reverting change: rGf45358903912: Revert "[mlir][linalg] Make Linalg vectorizer lower affine.apply".Feb 3 2023, 9:36 PM

awarzynski mentioned this in D143429: [mlir][linalg] Make Linalg vectorizer lower affine.apply (take 2).Feb 6 2023, 11:50 AM

awarzynski mentioned this in D143243: [mlir][linalg] Fix crash in vectorizer when expanding affine apply.Feb 7 2023, 3:14 AM

@dcaballe It actually seems fine to me 🤔 . I reduced your example to:

module {
  func.func @_iota_dim0_dispatch_0_generic_2x3(%1: index, %2: index, %3: index, %4: index, %5: tensor<2x3xf32>) -> tensor<2x3xf32>{
    %c3 = arith.constant 3 : index
    %6 = scf.for %arg1 = %3 to %c3 step %4 iter_args(%arg3 = %5) -> (tensor<2x3xf32>) {
      %extracted_slice = tensor.extract_slice %arg3[%arg1, 0] [1, 3] [1, 1] : tensor<2x3xf32> to tensor<1x3xf32>
      %7 = linalg.generic {indexing_maps = [affine_map<(d0, d1) -> (d0, d1)>], iterator_types = ["parallel", "parallel"]} outs(%extracted_slice : tensor<1x3xf32>) {
      ^bb0(%out: f32):
        %8 = linalg.index 0 : index
        %9 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%arg1, %8)
        %10 = arith.index_cast %9 : index to i32
        %11 = arith.sitofp %10 : i32 to f32
        linalg.yield %11 : f32
      } -> tensor<1x3xf32>
      %inserted_slice = tensor.insert_slice %7 into %arg3[%arg1, 0] [1, 3] [1, 1] : tensor<1x3xf32> into tensor<2x3xf32>
      scf.yield %inserted_slice : tensor<2x3xf32>
    }
    return %6: tensor<2x3xf32>
  }

  transform.sequence  failures(propagate) {
  ^bb0(%arg0: !pdl.operation):
    %0 = transform.structured.match ops{["linalg.generic"]} in %arg0 : (!pdl.operation) -> !pdl.operation
    %1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
    %2 = transform.structured.vectorize %1 {vectorize_nd_extract}
  }
}

Here's the output:

$ bin/mlir-opt -test-transform-dialect-interpreter file.mlir
module {
  func.func @_iota_dim0_dispatch_0_generic_2x3(%arg0: index, %arg1: index, %arg2: index, %arg3: index, %arg4: tensor<2x3xf32>) -> tensor<2x3xf32> {
    %c0 = arith.constant 0 : index
    %c3 = arith.constant 3 : index
    %0 = scf.for %arg5 = %arg2 to %c3 step %arg3 iter_args(%arg6 = %arg4) -> (tensor<2x3xf32>) {
      %1 = arith.index_cast %arg5 : index to i32
      %2 = arith.sitofp %1 : i32 to f32
      %3 = vector.broadcast %2 : f32 to vector<1x3xf32>
      %4 = vector.transfer_write %3, %arg6[%arg5, %c0] {in_bounds = [true, true]} : vector<1x3xf32>, tensor<2x3xf32>
      scf.yield %4 : tensor<2x3xf32>
    }
    return %0 : tensor<2x3xf32>
  }
  transform.sequence  failures(propagate) {
  ^bb0(%arg0: !pdl.operation):
    %0 = transform.structured.match ops{["linalg.generic"]} in %arg0 : (!pdl.operation) -> !pdl.operation
    %1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
    %2 = transform.structured.vectorize %1 {vectorize_nd_extract}
  }
}

Similarly to your example, linalg.index 0 :index yields single value: 0. Hence that "element" of affine.apply is not present in the output after the expansion. Similar thing happens in your example (that's why there's only one arith.addi despite affine.apply OP adding 3 elements).

Does this make sense to you?

Diff 491792

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

//===- Vectorization.cpp - Implementation of linalg Vectorization ---------===//		//===- Vectorization.cpp - Implementation of linalg Vectorization ---------===//
//		//
// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.		// Part of the LLVM Project, under the Apache License v2.0 with LLVM Exceptions.
// See https://llvm.org/LICENSE.txt for license information.		// See https://llvm.org/LICENSE.txt for license information.
// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception		// SPDX-License-Identifier: Apache-2.0 WITH LLVM-exception
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
//		//
// This file implements the linalg dialect Vectorization transformations.		// This file implements the linalg dialect Vectorization transformations.
//		//
//===----------------------------------------------------------------------===//		//===----------------------------------------------------------------------===//
		#include "mlir/Dialect/Affine/Utils.h"

#include "mlir/Analysis/SliceAnalysis.h"		#include "mlir/Analysis/SliceAnalysis.h"
#include "mlir/Dialect/Affine/IR/AffineOps.h"		#include "mlir/Dialect/Affine/IR/AffineOps.h"
#include "mlir/Dialect/Arith/IR/Arith.h"		#include "mlir/Dialect/Arith/IR/Arith.h"
#include "mlir/Dialect/Func/IR/FuncOps.h"		#include "mlir/Dialect/Func/IR/FuncOps.h"
#include "mlir/Dialect/Linalg/IR/Linalg.h"		#include "mlir/Dialect/Linalg/IR/Linalg.h"
#include "mlir/Dialect/Linalg/Transforms/Transforms.h"		#include "mlir/Dialect/Linalg/Transforms/Transforms.h"
#include "mlir/Dialect/Linalg/Utils/Utils.h"		#include "mlir/Dialect/Linalg/Utils/Utils.h"
▲ Show 20 Lines • Show All 1,023 Lines • ▼ Show 20 Lines	mlir::linalg::vectorizeLinalgOpPrecondition(LinalgOp linalgOp,
}		}
if (failed(reductionPreconditions(linalgOp))) {		if (failed(reductionPreconditions(linalgOp))) {
LDBG("precondition failed: reduction preconditions\n");		LDBG("precondition failed: reduction preconditions\n");
return failure();		return failure();
}		}
return success();		return success();
}		}

/// Emit a suitable vector form for a Linalg op. If provided, `inputVectorSizes`		/// Emit a suitable vector form for a Linalg op. If provided, `inputVectorSizes`
		dcaballeUnsubmitted Not Done Reply Inline Actions nit: `//` -> `///` and `.` at the end per coding guidelines. dcaballe: nit: `//` -> `///` and `.` at the end per coding guidelines.
/// are used to vectorize this operation. `inputVectorSizes` must match the rank		/// are used to vectorize this operation. `inputVectorSizes` must match the rank
/// of the iteration space of the operation and the sizes must be smaller or		/// of the iteration space of the operation and the sizes must be smaller or
/// equal than their counterpart interation space sizes, if static.		/// equal than their counterpart interation space sizes, if static.
/// `inputVectorShapes` also allows the vectorization of operations with dynamic		/// `inputVectorShapes` also allows the vectorization of operations with dynamic
/// shapes.		/// shapes.
LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, LinalgOp linalgOp,		LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, LinalgOp linalgOp,
ArrayRef<int64_t> inputVectorSizes,		ArrayRef<int64_t> inputVectorSizes,
bool vectorizeNDExtract) {		bool vectorizeNDExtract) {
Show All 19 Lines	LogicalResult mlir::linalg::vectorize(RewriterBase &rewriter, LinalgOp linalgOp,
FailureOr<Operation *> convOr = vectorizeConvolution(rewriter, linalgOp);		FailureOr<Operation *> convOr = vectorizeConvolution(rewriter, linalgOp);
if (succeeded(convOr)) {		if (succeeded(convOr)) {
llvm::append_range(results, (*convOr)->getResults());		llvm::append_range(results, (*convOr)->getResults());
} else {		} else {
if (failed(vectorizeLinalgOpPrecondition(linalgOp, inputVectorSizes,		if (failed(vectorizeLinalgOpPrecondition(linalgOp, inputVectorSizes,
vectorizeNDExtract)))		vectorizeNDExtract)))
return failure();		return failure();
LDBG("Vectorize generic by broadcasting to the canonical vector shape\n");		LDBG("Vectorize generic by broadcasting to the canonical vector shape\n");

		// Convert affine.apply to arithmetic operatiions before trying to
		dcaballeUnsubmitted Not Done Reply Inline Actions nit: `ii` typo dcaballe: nit: `ii` typo
		// vectorize
		SmallVector<Value> affineApplyOpsToDelete;
		auto oldIP = rewriter.saveInsertionPoint();
		auto &newIP = linalgOp.getBlock()->front();
		rewriter.setInsertionPointAfter(&newIP);
		auto toReplace = linalgOp.getBlock()->getOps<AffineApplyOp>();
		dcaballeUnsubmitted Not Done Reply Inline Actions When we have to change the insertion point, we use `OpBuilder::Guard` using RAII. I'd suggest that you move this to a utility function and then you can do RAII using the scope of the function. You can look at other examples in MLIR. Just search for `OpBuilder::Guard`. This IR change could be part of a larger set of "linalgOp pre-processing" transformations that happens right before vectorization starts but after we know we can vectorize the op. dcaballe: When we have to change the insertion point, we use `OpBuilder::Guard` using RAII. I'd suggest…
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions This IR change could be part of a larger set of "linalgOp pre-processing" transformations that happens right before vectorization starts but after we know we can vectorize the op. I like this idea :) Just to double-check - that set is yet to be created, right? awarzynski: >This IR change could be part of a larger set of "linalgOp pre-processing" transformations that…

		for (auto op : toReplace) {
		auto expanded = expandAffineExpr(rewriter, op->getLoc(),
		op.getAffineMap().getResult(0),
		op.getOperands(), ValueRange{});
		op.replaceAllUsesWith(expanded);
		affineApplyOpsToDelete.push_back(op);
		}
		for (auto op : affineApplyOpsToDelete) {
		rewriter.eraseOp(op.getDefiningOp());
		}
		rewriter.restoreInsertionPoint(oldIP);
		dcaballeUnsubmitted Not Done Reply Inline Actions Couldn't we just do `rewriter.replaceOp(op, expanded)` and avoid the manual U-D chain update? dcaballe: Couldn't we just do `rewriter.replaceOp(op, expanded)` and avoid the manual U-D chain update?
		awarzynskiAuthorUnsubmitted Done Reply Inline Actions Perhaps I'm being daft, but things go horrible wrong when I do that. And I assume that that's because `rewriter.replaceOp` invalidates the iterators in the surrounding for loop. awarzynski: Perhaps I'm being daft, but things go horrible wrong when I do that. And I assume that that's…

// TODO: 'vectorize' takes in a 'RewriterBase' which is up-casted to		// TODO: 'vectorize' takes in a 'RewriterBase' which is up-casted to
// 'OpBuilder' when it is passed over to some methods like		// 'OpBuilder' when it is passed over to some methods like
// 'vectorizeAsLinalgGeneric'. This is highly problematic: if we erase an op		// 'vectorizeAsLinalgGeneric'. This is highly problematic: if we erase an op
// within these methods, the actual rewriter won't be notified and we will		// within these methods, the actual rewriter won't be notified and we will
// end up with read-after-free issues!		// end up with read-after-free issues!
if (failed(vectorizeAsLinalgGeneric(rewriter, state, linalgOp, results)))		if (failed(vectorizeAsLinalgGeneric(rewriter, state, linalgOp, results)))
return failure();		return failure();
}		}
▲ Show 20 Lines • Show All 1,382 Lines • Show Last 20 Lines

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

Show First 20 Lines • Show All 158 Lines • ▼ Show 20 Lines	bool allIndexingsAreProjectedPermutation(LinalgOp op) {
});		});
}		}

bool hasOnlyScalarElementwiseOp(Region &r) {		bool hasOnlyScalarElementwiseOp(Region &r) {
if (!llvm::hasSingleElement(r))		if (!llvm::hasSingleElement(r))
return false;		return false;
for (Operation &op : r.front()) {		for (Operation &op : r.front()) {
if (!(isa<arith::ConstantOp, func::ConstantOp, tensor::ExtractOp,		if (!(isa<arith::ConstantOp, func::ConstantOp, tensor::ExtractOp,
linalg::YieldOp, linalg::IndexOp>(op) \|\|		linalg::YieldOp, linalg::IndexOp, AffineApplyOp>(op) \|\|
OpTrait::hasElementwiseMappableTraits(&op)) \|\|		OpTrait::hasElementwiseMappableTraits(&op)) \|\|
llvm::any_of(op.getResultTypes(),		llvm::any_of(op.getResultTypes(),
[](Type type) { return !type.isIntOrIndexOrFloat(); }))		[](Type type) { return !type.isIntOrIndexOrFloat(); }))
return false;		return false;
}		}
return true;		return true;
}		}

▲ Show 20 Lines • Show All 921 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/vectorization.mlir

Show First 20 Lines • Show All 1,998 Lines • ▼ Show 20 Lines	^bb1(%arg1: !pdl.operation):
%0 = transform.structured.match ops{["linalg.generic"]} in %arg1		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
%2 = transform.structured.vectorize %1		%2 = transform.structured.vectorize %1
}		}

// CHECK-LABEL: @wrong_reduction_detection		// CHECK-LABEL: @wrong_reduction_detection
// CHECK: vector.broadcast		// CHECK: vector.broadcast
// CHECK: vector.transfer_write		// CHECK: vector.transfer_write

		// -----

		// Regression test: %12 was considered as not vectorizable despite there being
		// a simple arithmetic representation that can be used instead

		#map0 = affine_map<(d0, d1, d2) -> (d0, d1, d2)>

		func.func @affine_apply(%arg0: tensor<128x12x32xf32>, %arg3: index) -> tensor<128x12x32xf32> {
		%0 = tensor.empty() : tensor<128x12x32xf32>
		%1 = linalg.generic {indexing_maps = [#map0, #map0],
		iterator_types = ["parallel", "parallel", "parallel"]}
		ins(%arg0 : tensor<128x12x32xf32>)
		outs(%0 : tensor<128x12x32xf32>) {
		^bb0(%arg1: f32, %arg2: f32):
		%2 = linalg.index 2 : index
		%12 = affine.apply affine_map<(d0, d1) -> (d0 + d1)>(%2, %arg3)
		%3 = arith.index_cast %12 : index to i32
		%4 = arith.uitofp %3 : i32 to f32
		%5 = arith.mulf %4, %arg1 : f32
		linalg.yield %5 : f32
		} -> tensor<128x12x32xf32>
		return %1 : tensor<128x12x32xf32>
		}

		transform.sequence failures(propagate) {
		^bb1(%arg1: !pdl.operation):
		%0 = transform.structured.match ops{["linalg.generic"]} in %arg1
		%1 = get_closest_isolated_parent %0 : (!pdl.operation) -> !pdl.operation
		%2 = transform.structured.vectorize %1 { vectorize_nd_extract }
		}

		// CHECK-LABEL: @affine_apply
		dcaballeUnsubmitted Not Done Reply Inline Actions This looks like a new feature to me more than a regression. I think we should match the decomposed ops and make sure they are vectorized accordingly. dcaballe: This looks like a new feature to me more than a regression. I think we should match the…
		// CHECK: vector.transfer_read
		// CHECK: vector.broadcast
		// CHECK: vector.transfer_write

This is an archive of the discontinued LLVM Phabricator instance.

[WIP] Make Linalg vectorizer lower affine.apply
ClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 491792

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

mlir/test/Dialect/Linalg/vectorization.mlir

This is an archive of the discontinued LLVM Phabricator instance.

[WIP] Make Linalg vectorizer lower affine.applyClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 491792

mlir/lib/Dialect/Linalg/Transforms/Vectorization.cpp

mlir/lib/Dialect/Linalg/Utils/Utils.cpp

mlir/test/Dialect/Linalg/vectorization.mlir

[WIP] Make Linalg vectorizer lower affine.apply
ClosedPublic