This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] Sparse reduction in lex order no longer produces dense output
ClosedPublic

Authored by jim22k on Jan 18 2023, 12:02 PM.

Details

Summary

Previously, when performing a reduction on a sparse tensor, the result
would be different depending on iteration order. For expanded access pattern,
an empty row would contribute no entry in the output. For lex ordering, the
identity would end up in the output.

This code changes that behavior and keeps track of whether any entries were
actually reduced during lex ordering, making the output consistent between the
two iteration styles.

Diff Detail

Event Timeline

jim22k created this revision.Jan 18 2023, 12:02 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2023, 12:02 PM
jim22k requested review of this revision.Jan 18 2023, 12:02 PM

To show this in action, consider this function from test/Integration/Dialect/SparseTensor/CPU/sparse_reduce_custom.mlir:

func.func @redProdLex(%arga: tensor<?x?xf64, #CSR>) -> tensor<?xf64, #SparseVector> {
  %c0 = arith.constant 0 : index
  %cf1 = arith.constant 1.0 : f64
  %d0 = tensor.dim %arga, %c0 : tensor<?x?xf64, #CSR>
  %xv = bufferization.alloc_tensor(%d0): tensor<?xf64, #SparseVector>
  %0 = linalg.generic #trait_mat_reduce_rowwise
    ins(%arga: tensor<?x?xf64, #CSR>)
    outs(%xv: tensor<?xf64, #SparseVector>) {
      ^bb(%a: f64, %b: f64):
        %1 = sparse_tensor.reduce %a, %b, %cf1 : f64 {
            ^bb0(%x: f64, %y: f64):
              %2 = arith.mulf %x, %y : f64
              sparse_tensor.yield %2 : f64
          }
        linalg.yield %1 : f64
  } -> tensor<?xf64, #SparseVector>
  return %0 : tensor<?xf64, #SparseVector>
}

This will be converted to:

func.func @redProdLex(%arg0: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>) -> tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %false = arith.constant false
  %true = arith.constant true
  %cst = arith.constant 1.000000e+00 : f64
  %dim = tensor.dim %arg0, %c0 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
  %0 = bufferization.alloc_tensor(%dim) : tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>
  %dim_0 = tensor.dim %arg0, %c0 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
  %1 = sparse_tensor.pointers %arg0 {dimension = 1 : index} : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>> to memref<?xindex>
  %2 = sparse_tensor.values %arg0 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>> to memref<?xf64>
  %3 = scf.for %arg1 = %c0 to %dim_0 step %c1 iter_args(%arg2 = %0) -> (tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>) {
    %5 = memref.load %1[%arg1] : memref<?xindex>
    %6 = arith.addi %arg1, %c1 : index
    %7 = memref.load %1[%6] : memref<?xindex>
    %8:2 = scf.for %arg3 = %5 to %7 step %c1 iter_args(%arg4 = %cst, %arg5 = %false) -> (f64, i1) {
      %10 = memref.load %2[%arg3] : memref<?xf64>
      %11 = arith.mulf %10, %arg4 : f64
      scf.yield %11, %true : f64, i1
    } {"Emitted from" = "linalg.generic"}
    %9 = scf.if %8#1 -> (tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>) {
      %10 = sparse_tensor.insert %8#0 into %arg2[%arg1] : tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>
      scf.yield %10 : tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>
    } else {
      scf.yield %arg2 : tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>
    }
    scf.yield %9 : tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>
  } {"Emitted from" = "linalg.generic"}
  %4 = sparse_tensor.load %3 hasInserts : tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>
  return %4 : tensor<?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "compressed" ] }>>
}

%8:2 is where the tracking begins. Notice that the reduction identity %cst is passed, along with %false. As soon as an iteration of the for-loop happens, the 2nd parameter returned is %true. This boolean check of having iterated at least once allows us to either insert the reduction value (true branch of %9) or ignore the value (false branch of %9).

The other use case is with CSR @ CSC matrix multiplication, which involves while-loops.

func.func @min_plus_csrcsc(%arga: tensor<?x?xf64, #CSR>,
                           %argb: tensor<?x?xf64, #CSC>) -> tensor<?x?xf64, #CSR> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %maxf = arith.constant 1.0e999 : f64
  %d0 = tensor.dim %arga, %c0 : tensor<?x?xf64, #CSR>
  %d1 = tensor.dim %argb, %c1 : tensor<?x?xf64, #CSC>
  %xm = bufferization.alloc_tensor(%d0, %d1) : tensor<?x?xf64, #CSR>
  %0 = linalg.generic #trait_matmul
     ins(%arga, %argb: tensor<?x?xf64, #CSR>, tensor<?x?xf64, #CSC>)
      outs(%xm: tensor<?x?xf64, #CSR>) {
      ^bb(%a: f64, %b: f64, %output: f64):
        %1 = sparse_tensor.binary %a, %b : f64, f64 to f64
          overlap = {
            ^bb0(%x: f64, %y: f64):
              %3 = arith.addf %x, %y : f64
              sparse_tensor.yield %3 : f64
          }
          left={}
          right={}
        %2 = sparse_tensor.reduce %1, %output, %maxf : f64 {
            ^bb0(%x: f64, %y: f64):
              %cmp = arith.cmpf "olt", %x, %y : f64
              %3 = arith.select %cmp, %x, %y : f64
              sparse_tensor.yield %3 : f64
          }
        linalg.yield %2 : f64
  } -> tensor<?x?xf64, #CSR>
  return %0 : tensor<?x?xf64, #CSR>
}

This is converted to:

func.func @min_plus_csrcsc(%arg0: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>, %arg1: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)> }>>) -> tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>> {
  %c0 = arith.constant 0 : index
  %c1 = arith.constant 1 : index
  %false = arith.constant false
  %true = arith.constant true
  %cst = arith.constant 0x7FF0000000000000 : f64
  %dim = tensor.dim %arg0, %c0 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
  %dim_0 = tensor.dim %arg1, %c1 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)> }>>
  %0 = bufferization.alloc_tensor(%dim, %dim_0) : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
  %dim_1 = tensor.dim %arg0, %c0 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
  %1 = sparse_tensor.pointers %arg0 {dimension = 1 : index} : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>> to memref<?xindex>
  %2 = sparse_tensor.indices %arg0 {dimension = 1 : index} : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>> to memref<?xindex>
  %3 = sparse_tensor.values %arg0 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>> to memref<?xf64>
  %dim_2 = tensor.dim %arg1, %c1 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)> }>>
  %4 = sparse_tensor.pointers %arg1 {dimension = 1 : index} : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)> }>> to memref<?xindex>
  %5 = sparse_tensor.indices %arg1 {dimension = 1 : index} : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)> }>> to memref<?xindex>
  %6 = sparse_tensor.values %arg1 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ], dimOrdering = affine_map<(d0, d1) -> (d1, d0)> }>> to memref<?xf64>
  %7 = scf.for %arg2 = %c0 to %dim_1 step %c1 iter_args(%arg3 = %0) -> (tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>) {
    %9 = scf.for %arg4 = %c0 to %dim_2 step %c1 iter_args(%arg5 = %arg3) -> (tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>) {
      %10 = memref.load %1[%arg2] : memref<?xindex>
      %11 = arith.addi %arg2, %c1 : index
      %12 = memref.load %1[%11] : memref<?xindex>
      %13 = memref.load %4[%arg4] : memref<?xindex>
      %14 = arith.addi %arg4, %c1 : index
      %15 = memref.load %4[%14] : memref<?xindex>
      %16:5 = scf.while (%arg6 = %10, %arg7 = %13, %arg8 = %cst, %arg9 = %false, %arg10 = %arg5) : (index, index, f64, i1, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>) -> (index, index, f64, i1, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>) {
        %18 = arith.cmpi ult, %arg6, %12 : index
        %19 = arith.cmpi ult, %arg7, %15 : index
        %20 = arith.andi %18, %19 : i1
        scf.condition(%20) %arg6, %arg7, %arg8, %arg9, %arg10 : index, index, f64, i1, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
      } do {
      ^bb0(%arg6: index, %arg7: index, %arg8: f64, %arg9: i1, %arg10: tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>):
        %18 = memref.load %2[%arg6] : memref<?xindex>
        %19 = memref.load %5[%arg7] : memref<?xindex>
        %20 = arith.cmpi ult, %19, %18 : index
        %21 = arith.select %20, %19, %18 : index
        %22 = arith.cmpi eq, %18, %21 : index
        %23 = arith.cmpi eq, %19, %21 : index
        %24 = arith.andi %22, %23 : i1
        %25:3 = scf.if %24 -> (f64, i1, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>) {
          %32 = memref.load %3[%arg6] : memref<?xf64>
          %33 = memref.load %6[%arg7] : memref<?xf64>
          %34 = arith.addf %32, %33 : f64
          %35 = arith.cmpf olt, %34, %arg8 : f64
          %36 = arith.select %35, %34, %arg8 : f64
          scf.yield %36, %true, %arg10 : f64, i1, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
        } else {
          scf.yield %arg8, %arg9, %arg10 : f64, i1, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
        }
        %26 = arith.cmpi eq, %18, %21 : index
        %27 = arith.addi %arg6, %c1 : index
        %28 = arith.select %26, %27, %arg6 : index
        %29 = arith.cmpi eq, %19, %21 : index
        %30 = arith.addi %arg7, %c1 : index
        %31 = arith.select %29, %30, %arg7 : index
        scf.yield %28, %31, %25#0, %25#1, %25#2 : index, index, f64, i1, tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
      } attributes {"Emitted from" = "linalg.generic"}
      %17 = scf.if %16#3 -> (tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>) {
        %18 = sparse_tensor.insert %16#2 into %16#4[%arg2, %arg4] : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
        scf.yield %18 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
      } else {
        scf.yield %16#4 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
      }
      scf.yield %17 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
    } {"Emitted from" = "linalg.generic"}
    scf.yield %9 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
  } {"Emitted from" = "linalg.generic"}
  %8 = sparse_tensor.load %7 hasInserts : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
  return %8 : tensor<?x?xf64, #sparse_tensor.encoding<{ dimLevelType = [ "dense", "compressed" ] }>>
}

The important change is in %16:5 with the addition of %false to the argument list. Later in %25:3, the true branch represents an overlap of the dot-product, and therefore a non-empty result of the dot-product. As such, we return %true. The false branch simply returns the previous "valid lex insert" flag argument. As long as a single overlap has occurred, we perform the insert in %17. Otherwise we ignore it.

This is good stuff! Thanks for adding, Jim, and sorry for taking a bit longer on my review.

mlir/lib/Dialect/SparseTensor/Transforms/CodegenEnv.cpp
77

part of the codegenenv is also to verify consistency

so perhaps an else assert(!redValidInsert) here?

237

this is good, and in the line of the bookkeeping validation.
But can it be even stricter?
i.e. if isReduc() we need val?

EDIT: later, WDYT of a clearValidLexInsert()?

mlir/lib/Dialect/SparseTensor/Transforms/CodegenEnv.h
179

"to determine if" at first reading implies a boolean value, so please be a bit more specific (when not set ... when set ....)

mlir/lib/Dialect/SparseTensor/Transforms/Sparsification.cpp
658

Can you put more documentation here, i.e. pseudo code. Just saying, true/false branch is not very informative

(and I agree that genInsertionStore could have used a lot more to start with, but we are now growing beyond what is easy to read, so bonus points for documenting some of the other branches too ;-)

893

perhaps we even want a clearValidLexInsert() for this? Also better error detection when the assert in codegenenv fails

jim22k updated this revision to Diff 495872.Feb 8 2023, 9:13 AM
jim22k marked 5 inline comments as done.
Add clearValidLexInsert() and improve comments
aartbik accepted this revision.Feb 9 2023, 2:04 PM

Solid work. Thanks!

mlir/lib/Dialect/SparseTensor/Transforms/CodegenEnv.cpp
242

yeah, very nice!

This revision is now accepted and ready to land.Feb 9 2023, 2:04 PM
This revision was landed with ongoing or failed builds.Feb 10 2023, 11:09 AM
This revision was automatically updated to reflect the committed changes.