This is an archive of the discontinued LLVM Phabricator instance.

Paths

Table of Contentst

-
mlir/
-
lib/Dialect/Linalg/Transforms/
-
Dialect/
-
Linalg/
-
Transforms/
-
Sparsification.cpp
-
test/Dialect/Linalg/
-
Dialect/
-
Linalg/
-
sparse_1d.mlir
6/6
sparse_2d.mlir
2/2
sparse_3d.mlir

Differential D91818

[mlir][sparse] refine optimization, add few more test cases
ClosedPublic

Authored by aartbik on Nov 19 2020, 1:13 PM.

Download Raw Diff

Details

Reviewers

nicolasvasilache
penpornk

Commits

rGaf42550523d9: [mlir][sparse] refine optimization, add few more test cases

Summary

Adds tests for full sum reduction (tensors summed up into scalars)
and the well-known sampled-dense-dense-matrix-product. Refines
the optimizations rules slightly to handle the summation better.

Diff Detail

Repository: rG LLVM Github Monorepo

Event Timeline

aartbik created this revision.Nov 19 2020, 1:13 PM

Herald added a project: Restricted Project. · View Herald TranscriptNov 19 2020, 1:13 PM

Herald added subscribers: teijeong, rdzhabarov, tatianashp and 13 others. · View Herald Transcript

aartbik requested review of this revision.Nov 19 2020, 1:13 PM

Herald added a reviewer: nicolasvasilache. · View Herald TranscriptNov 19 2020, 1:13 PM

Herald added subscribers: stephenneuendorffer, nicolasvasilache. · View Herald Transcript

aartbik added a reviewer: penpornk.Nov 19 2020, 1:14 PM

Harbormaster completed remote builds in B79516: Diff 306518.Nov 19 2020, 1:51 PM

The generated MLIR code looks good to me. Thank you for the tests / examples! :D

mlir/test/Dialect/Linalg/sparse_2d.mlir
1111	Is changing the order of the mapping here equivalent to changing the traversal order? For example, does `(i,j,k) -> (j, i)` mean `A` is DCSC instead of DCSR?
1123	Nit: Can we use `S` instead of `A` to indicate that it's a sparse mask? (That way, we can use `A` and `B` for `A * B` to be consistent with normal notation for multiplication.) I realize you'll need to change the variable names everywhere, so if you prefer to keep the original names, I'm okay with that too.
1142	`VAL_13` and `VAL_15` are the same here (the size of the reduction dimension). Could we just reuse `VAL_13` in the future, or is there a specific reason to keep all tensor dimension sizes separate?
mlir/test/Dialect/Linalg/sparse_3d.mlir
1263	For summation, we actually can just loop through the nnz array (`VAL_11`) and forgo all the multi-level indexing. Is this what you mean when you said you prefer keeping all the i, j, k indices versus flattening?

This revision is now accepted and ready to land.Nov 20 2020, 4:19 PM

Herald added a subscriber: mravishankar. · View Herald TranscriptNov 20 2020, 4:19 PM

aartbik marked 4 inline comments as done.Nov 20 2020, 4:31 PM

aartbik added inline comments.

mlir/test/Dialect/Linalg/sparse_2d.mlir
1111	Yes, for now just permuting the indices of the tensor access is similar to TACO's ordering. However, in a follow-up CL, I will introduce another affine map to define that order (since in the long run, such an annotation will be decoupled from the operation)
1123	Sure, don't mind doing that just here, since it makes sense for this kernel (keeping the name elsewhere the same though)
1142	No reason other than a mechanical translation of the dimensions. In the long run, we probably will use other APIs to find the loop sizes.
mlir/test/Dialect/Linalg/sparse_3d.mlir
1263	Yes! Making this one flattened loop will be indeed one of the future optimizations (since it vectorizes so much better too).

renamed A BC into S AB

Harbormaster completed remote builds in B79667: Diff 306802.Nov 20 2020, 4:51 PM

Closed by commit rGaf42550523d9: [mlir][sparse] refine optimization, add few more test cases (authored by aartbik). · Explain WhyNov 20 2020, 5:02 PM

This revision was automatically updated to reflect the committed changes.

aartbik added a commit: rGaf42550523d9: [mlir][sparse] refine optimization, add few more test cases.

Revision Contents

Path

Size

mlir/

lib/

Dialect/

Linalg/

Transforms/

Sparsification.cpp

14 lines

test/

Dialect/

Linalg/

sparse_1d.mlir

45 lines

sparse_2d.mlir

128 lines

sparse_3d.mlir

58 lines

Diff 306806

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp

Show First 20 Lines • Show All 134 Lines • ▼ Show 20 Lines	unsigned takeDisj(Kind kind, unsigned s0, unsigned s1) {
unsigned s = takeConj(kind, s0, s1);		unsigned s = takeConj(kind, s0, s1);
for (unsigned p : latSets[s0])		for (unsigned p : latSets[s0])
latSets[s].push_back(p);		latSets[s].push_back(p);
for (unsigned p : latSets[s1])		for (unsigned p : latSets[s1])
latSets[s].push_back(p);		latSets[s].push_back(p);
return s;		return s;
}		}

/// Optimizes the iteration lattice points in the given set.		/// Optimizes the iteration lattice points in the given set. This
		/// method should be called right before code generation to avoid
		/// generating redundant loops and conditions.
unsigned optimize(unsigned s0) {		unsigned optimize(unsigned s0) {
unsigned s = addSet();		unsigned s = addSet();
assert(latSets[s0].size() != 0);		assert(latSets[s0].size() != 0);
unsigned p0 = latSets[s0][0];		unsigned p0 = latSets[s0][0];
for (unsigned p1 : latSets[s0]) {		for (unsigned p1 : latSets[s0]) {
bool add = true;		bool add = true;
if (p0 != p1) {		if (p0 != p1) {
		// Is this a straightforward copy?
		unsigned e = latPoints[p1].exp;
		if (exp(e).kind == Kind::kTensor && exp(e).e0 == numTensors - 1)
		continue;
		// Is any dense index exhausted?
llvm::BitVector tmp = latPoints[p1].bits;		llvm::BitVector tmp = latPoints[p1].bits;
tmp ^= latPoints[p0].bits;		tmp ^= latPoints[p0].bits;
if (hasAnyOf(tmp, false))		if (hasAnyOf(tmp, false))
continue; // dense exhausted?		continue;
		// Is this a direct duplication of an earlier conjunction?
for (unsigned p2 : latSets[s]) {		for (unsigned p2 : latSets[s]) {
tmp = latPoints[p1].bits;		tmp = latPoints[p1].bits;
tmp ^= latPoints[p2].bits;		tmp ^= latPoints[p2].bits;
if (tmp.count() == 0) {		if (tmp.count() == 0) {
add = false; // direct dup?		add = false;
break;		break;
}		}
}		}
assert(!add \|\| latGT(p0, p1));		assert(!add \|\| latGT(p0, p1));
}		}
if (add)		if (add)
latSets[s].push_back(p1);		latSets[s].push_back(p1);
}		}
▲ Show 20 Lines • Show All 711 Lines • Show Last 20 Lines

mlir/test/Dialect/Linalg/sparse_1d.mlir

Show First 20 Lines • Show All 629 Lines • ▼ Show 20 Lines	func @mul_ss(%arga: tensor<32xf32>, %argb: tensor<32xf32>) -> tensor<32xf32> {
%0 = linalg.generic #trait_ss		%0 = linalg.generic #trait_ss
ins(%arga, %argb: tensor<32xf32>, tensor<32xf32>) {		ins(%arga, %argb: tensor<32xf32>, tensor<32xf32>) {
^bb(%a: f32, %b: f32):		^bb(%a: f32, %b: f32):
%0 = mulf %a, %b : f32		%0 = mulf %a, %b : f32
linalg.yield %0 : f32		linalg.yield %0 : f32
} -> tensor<32xf32>		} -> tensor<32xf32>
return %0 : tensor<32xf32>		return %0 : tensor<32xf32>
}		}

		#trait_sum_reduction = {
		indexing_maps = [
		affine_map<(i) -> (i)>, // a
		affine_map<(i) -> ()> // x (scalar out)
		],
		sparse = [
		[ "S" ], // a
		[ ] // x
		],
		iterator_types = ["reduction"],
		doc = "x = SUM_i a(i)"
		}

		// CHECK-LABEL: func @sum_reduction(
		// CHECK-SAME: %[[VAL_0:.*]]: tensor<?xf32>,
		// CHECK-SAME: %[[VAL_1:.*]]: tensor<f32>) -> tensor<f32> {
		// CHECK: %[[VAL_2:.*]] = constant 999 : index
		// CHECK: %[[VAL_3:.*]] = constant 0 : index
		// CHECK: %[[VAL_4:.*]] = constant 1 : index
		// CHECK: %[[VAL_5:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>
		// CHECK: %[[VAL_6:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>
		// CHECK: %[[VAL_7:.*]] = alloca(%[[VAL_2]]) : memref<?xf32>
		// CHECK: %[[VAL_8:.*]] = alloca() : memref<f32>
		// CHECK: %[[VAL_9:.*]] = load %[[VAL_5]]{{\[}}%[[VAL_3]]] : memref<?xindex>
		// CHECK: %[[VAL_10:.*]] = load %[[VAL_5]]{{\[}}%[[VAL_4]]] : memref<?xindex>
		// CHECK: scf.for %[[VAL_11:.*]] = %[[VAL_9]] to %[[VAL_10]] step %[[VAL_4]] {
		// CHECK: %[[VAL_12:.*]] = load %[[VAL_8]][] : memref<f32>
		// CHECK: %[[VAL_13:.*]] = load %[[VAL_7]]{{\[}}%[[VAL_11]]] : memref<?xf32>
		// CHECK: %[[VAL_14:.*]] = addf %[[VAL_12]], %[[VAL_13]] : f32
		// CHECK: store %[[VAL_14]], %[[VAL_8]][] : memref<f32>
		// CHECK: }
		// CHECK: %[[VAL_15:.*]] = tensor_load %[[VAL_8]] : memref<f32>
		// CHECK: return %[[VAL_15]] : tensor<f32>
		// CHECK: }
		func @sum_reduction(%arga: tensor<?xf32>, %argx: tensor<f32>) -> tensor<f32> {
		%0 = linalg.generic #trait_sum_reduction
		ins(%arga : tensor<?xf32>)
		init(%argx : tensor<f32>) {
		^bb(%a : f32, %x : f32):
		%0 = addf %x, %a : f32
		linalg.yield %0: f32
		} -> tensor<f32>
		return %0 : tensor<f32>
		}

mlir/test/Dialect/Linalg/sparse_2d.mlir

Show First 20 Lines • Show All 1,050 Lines • ▼ Show 20 Lines %0 = linalg.generic #trait_matvec

init(%argx : tensor<16xf32>) { init(%argx : tensor<16xf32>) {

^bb(%A: f32, %b: f32, %x: f32): ^bb(%A: f32, %b: f32, %x: f32):

%0 = mulf %A, %b : f32 %0 = mulf %A, %b : f32

%1 = addf %0, %x : f32 %1 = addf %0, %x : f32

linalg.yield %1 : f32 linalg.yield %1 : f32

} -> tensor<16xf32> } -> tensor<16xf32>

return %0 : tensor<16xf32> return %0 : tensor<16xf32>

} }

#trait_sum_reduction = {

indexing_maps = [

affine_map<(i,j) -> (i,j)>, // a

affine_map<(i,j) -> ()> // x (scalar out)

sparse = [

[ "D","S" ], // a

[ ] // x

iterator_types = ["reduction", "reduction"],

doc = "x = SUM_ij a(i,j)"

}

// CHECK-LABEL: func @sum_reduction(

// CHECK-SAME: %[[VAL_0:.*]]: tensor<10x20xf32>,

// CHECK-SAME: %[[VAL_1:.*]]: tensor<f32>) -> tensor<f32> {

// CHECK: %[[VAL_2:.*]] = constant 999 : index

// CHECK: %[[VAL_3:.*]] = constant 10 : index

// CHECK: %[[VAL_4:.*]] = constant 0 : index

// CHECK: %[[VAL_5:.*]] = constant 1 : index

// CHECK: %[[VAL_6:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>

// CHECK: %[[VAL_7:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>

// CHECK: %[[VAL_8:.*]] = alloca(%[[VAL_2]]) : memref<?xf32>

// CHECK: %[[VAL_9:.*]] = alloca() : memref<f32>

// CHECK: scf.for %[[VAL_10:.*]] = %[[VAL_4]] to %[[VAL_3]] step %[[VAL_5]] {

// CHECK: %[[VAL_11:.*]] = load %[[VAL_6]]{{\[}}%[[VAL_10]]] : memref<?xindex>

// CHECK: %[[VAL_12:.*]] = addi %[[VAL_10]], %[[VAL_5]] : index

// CHECK: %[[VAL_13:.*]] = load %[[VAL_6]]{{\[}}%[[VAL_12]]] : memref<?xindex>

// CHECK: scf.for %[[VAL_14:.*]] = %[[VAL_11]] to %[[VAL_13]] step %[[VAL_5]] {

// CHECK: %[[VAL_15:.*]] = load %[[VAL_9]][] : memref<f32>

// CHECK: %[[VAL_16:.*]] = load %[[VAL_8]]{{\[}}%[[VAL_14]]] : memref<?xf32>

// CHECK: %[[VAL_17:.*]] = addf %[[VAL_15]], %[[VAL_16]] : f32

// CHECK: store %[[VAL_17]], %[[VAL_9]][] : memref<f32>

// CHECK: }

// CHECK: %[[VAL_18:.*]] = tensor_load %[[VAL_9]] : memref<f32>

// CHECK: return %[[VAL_18]] : tensor<f32>

// CHECK: }

func @sum_reduction(%arga: tensor<10x20xf32>, %argx: tensor<f32>) -> tensor<f32> {

%0 = linalg.generic #trait_sum_reduction

ins(%arga : tensor<10x20xf32>)

init(%argx : tensor<f32>) {

^bb(%a : f32, %x : f32):

%0 = addf %x, %a : f32

linalg.yield %0: f32

} -> tensor<f32>

return %0 : tensor<f32>

}

#trait_sampled_dense_dense = {

indexing_maps = [

affine_map<(i,j,k) -> (i,j)>, // S

penpornkUnsubmitted

Done

Is changing the order of the mapping here equivalent to changing the traversal order? For example, does (i,j,k) -> (j, i) mean A is DCSC instead of DCSR?

penpornk: Is changing the order of the mapping here equivalent to changing the traversal order? For…

aartbikAuthorUnsubmitted

Done

Yes, for now just permuting the indices of the tensor access is similar to TACO's ordering.
However, in a follow-up CL, I will introduce another affine map to define that order (since in the long run, such an annotation will be decoupled from the operation)

aartbik: Yes, for now just permuting the indices of the tensor access is similar to TACO's ordering.

affine_map<(i,j,k) -> (i,k)>, // A

affine_map<(i,j,k) -> (k,j)>, // B

affine_map<(i,j,k) -> (i,j)> // X (out)

sparse = [

[ "S", "S" ], // S

[ "D", "D" ], // A

[ "D", "D" ], // B

[ "D", "D" ] // X

iterator_types = ["parallel", "parallel", "reduction"],

doc = "X(i,j) = S(i,j) SUM_k A(i,k) B(k,j)"

penpornkUnsubmitted

Done

iterator_types = ["parallel", "parallel", "reduction"],

- doc = "X(i,j) = A(i,j) SUM_k B(i,k) C(k,j)"

+ doc = "X(i,j) = S(i,j) SUM_k A(i,k) B(k,j)"

}

// CHECK-LABEL: func @sampled_dense_dense(

Nit: Can we use S instead of A to indicate that it's a sparse mask? (That way, we can use A and B for A * B to be consistent with normal notation for multiplication.)

I realize you'll need to change the variable names everywhere, so if you prefer to keep the original names, I'm okay with that too.

penpornk: Nit: Can we use `S` instead of `A` to indicate that it's a sparse mask? (That way, we can use…

aartbikAuthorUnsubmitted

Done

Sure, don't mind doing that just here, since it makes sense for this kernel
(keeping the name elsewhere the same though)

aartbik: Sure, don't mind doing that just here, since it makes sense for this kernel (keeping the name…

}

// CHECK-LABEL: func @sampled_dense_dense(

// CHECK-SAME: %[[VAL_0:.*0]]: tensor<?x?xf32>,

// CHECK-SAME: %[[VAL_1:.*1]]: tensor<?x?xf32>,

// CHECK-SAME: %[[VAL_2:.*2]]: tensor<?x?xf32>,

// CHECK-SAME: %[[VAL_3:.*3]]: tensor<?x?xf32>) -> tensor<?x?xf32> {

// CHECK: %[[VAL_4:.*]] = constant 999 : index

// CHECK: %[[VAL_5:.*]] = constant 0 : index

// CHECK: %[[VAL_6:.*]] = constant 1 : index

// CHECK: %[[VAL_7:.*]] = alloca(%[[VAL_4]]) : memref<?xindex>

// CHECK: %[[VAL_8:.*]] = alloca(%[[VAL_4]]) : memref<?xindex>

// CHECK: %[[VAL_9:.*]] = alloca(%[[VAL_4]]) : memref<?xindex>

// CHECK: %[[VAL_10:.*]] = alloca(%[[VAL_4]]) : memref<?xindex>

// CHECK: %[[VAL_11:.*]] = alloca(%[[VAL_4]]) : memref<?xf32>

// CHECK: %[[VAL_12:.*]] = dim %[[VAL_1]], %[[VAL_5]] : tensor<?x?xf32>

// CHECK: %[[VAL_13:.*]] = dim %[[VAL_1]], %[[VAL_6]] : tensor<?x?xf32>

// CHECK: %[[VAL_14:.*]] = alloca(%[[VAL_12]], %[[VAL_13]]) : memref<?x?xf32>

// CHECK: %[[VAL_15:.*]] = dim %[[VAL_2]], %[[VAL_5]] : tensor<?x?xf32>

penpornkUnsubmitted

Done

VAL_13 and VAL_15 are the same here (the size of the reduction dimension). Could we just reuse VAL_13 in the future, or is there a specific reason to keep all tensor dimension sizes separate?

penpornk: `VAL_13` and `VAL_15` are the same here (the size of the reduction dimension). Could we just…

aartbikAuthorUnsubmitted

Done

No reason other than a mechanical translation of the dimensions. In the long run, we probably will use other APIs to find the loop sizes.

aartbik: No reason other than a mechanical translation of the dimensions. In the long run, we probably…

// CHECK: %[[VAL_16:.*]] = dim %[[VAL_2]], %[[VAL_6]] : tensor<?x?xf32>

// CHECK: %[[VAL_17:.*]] = alloca(%[[VAL_15]], %[[VAL_16]]) : memref<?x?xf32>

// CHECK: %[[VAL_18:.*]] = dim %[[VAL_3]], %[[VAL_5]] : tensor<?x?xf32>

// CHECK: %[[VAL_19:.*]] = dim %[[VAL_3]], %[[VAL_6]] : tensor<?x?xf32>

// CHECK: %[[VAL_20:.*]] = alloca(%[[VAL_18]], %[[VAL_19]]) : memref<?x?xf32>

// CHECK: %[[VAL_21:.*]] = load %[[VAL_7]]{{\[}}%[[VAL_5]]] : memref<?xindex>

// CHECK: %[[VAL_22:.*]] = load %[[VAL_7]]{{\[}}%[[VAL_6]]] : memref<?xindex>

// CHECK: scf.for %[[VAL_23:.*]] = %[[VAL_21]] to %[[VAL_22]] step %[[VAL_6]] {

// CHECK: %[[VAL_24:.*]] = load %[[VAL_8]]{{\[}}%[[VAL_23]]] : memref<?xindex>

// CHECK: scf.for %[[VAL_25:.*]] = %[[VAL_5]] to %[[VAL_15]] step %[[VAL_6]] {

// CHECK: %[[VAL_26:.*]] = load %[[VAL_9]]{{\[}}%[[VAL_23]]] : memref<?xindex>

// CHECK: %[[VAL_27:.*]] = addi %[[VAL_23]], %[[VAL_6]] : index

// CHECK: %[[VAL_28:.*]] = load %[[VAL_9]]{{\[}}%[[VAL_27]]] : memref<?xindex>

// CHECK: scf.for %[[VAL_29:.*]] = %[[VAL_26]] to %[[VAL_28]] step %[[VAL_6]] {

// CHECK: %[[VAL_30:.*]] = load %[[VAL_10]]{{\[}}%[[VAL_29]]] : memref<?xindex>

// CHECK: %[[VAL_31:.*]] = load %[[VAL_20]]{{\[}}%[[VAL_24]], %[[VAL_30]]] : memref<?x?xf32>

// CHECK: %[[VAL_32:.*]] = load %[[VAL_11]]{{\[}}%[[VAL_29]]] : memref<?xf32>

// CHECK: %[[VAL_33:.*]] = load %[[VAL_14]]{{\[}}%[[VAL_24]], %[[VAL_25]]] : memref<?x?xf32>

// CHECK: %[[VAL_34:.*]] = load %[[VAL_17]]{{\[}}%[[VAL_25]], %[[VAL_30]]] : memref<?x?xf32>

// CHECK: %[[VAL_35:.*]] = mulf %[[VAL_33]], %[[VAL_34]] : f32

// CHECK: %[[VAL_36:.*]] = mulf %[[VAL_32]], %[[VAL_35]] : f32

// CHECK: %[[VAL_37:.*]] = addf %[[VAL_31]], %[[VAL_36]] : f32

// CHECK: store %[[VAL_37]], %[[VAL_20]]{{\[}}%[[VAL_24]], %[[VAL_30]]] : memref<?x?xf32>

// CHECK: }

// CHECK: %[[VAL_38:.*]] = tensor_load %[[VAL_20]] : memref<?x?xf32>

// CHECK: return %[[VAL_38]] : tensor<?x?xf32>

// CHECK: }

func @sampled_dense_dense(%args: tensor<?x?xf32>,

%arga: tensor<?x?xf32>,

%argb: tensor<?x?xf32>,

%argx: tensor<?x?xf32>) -> tensor<?x?xf32> {

%0 = linalg.generic #trait_sampled_dense_dense

ins(%args, %arga, %argb : tensor<?x?xf32>, tensor<?x?xf32>, tensor<?x?xf32>)

init(%argx : tensor<?x?xf32>) {

^bb(%s : f32, %a : f32, %b : f32, %x : f32):

%0 = mulf %a, %b : f32

%1 = mulf %s, %0 : f32

%2 = addf %x, %1 : f32

linalg.yield %2: f32

} -> tensor<?x?xf32>

return %0 : tensor<?x?xf32>

}

mlir/test/Dialect/Linalg/sparse_3d.mlir

Show First 20 Lines • Show All 1,217 Lines • ▼ Show 20 Lines	%0 = linalg.generic #trait_kernel_3d
^bb(%b: f32, %c: f32, %d : f32, %a : f32):		^bb(%b: f32, %c: f32, %d : f32, %a : f32):
%0 = mulf %b, %c : f32		%0 = mulf %b, %c : f32
%1 = mulf %0, %d : f32		%1 = mulf %0, %d : f32
%2 = addf %1, %a : f32		%2 = addf %1, %a : f32
linalg.yield %2 : f32		linalg.yield %2 : f32
} -> tensor<?x?xf32>		} -> tensor<?x?xf32>
return %0 : tensor<?x?xf32>		return %0 : tensor<?x?xf32>
}		}
		#trait_sum_reduction = {
		indexing_maps = [
		affine_map<(i,j,k) -> (i,j,k)>, // a
		affine_map<(i,j,k) -> ()> // x (scalar out)
		],
		sparse = [
		[ "S", "S", "S" ], // a
		[ ] // x
		],
		iterator_types = ["reduction", "reduction", "reduction"],
		doc = "x = SUM_ijk a(i,j,k)"
		}

		// CHECK-LABEL: func @sum_reduction(
		// CHECK-SAME: %[[VAL_0:.*]]: tensor<10x20x30xf32>,
		// CHECK-SAME: %[[VAL_1:.*]]: tensor<f32>) -> tensor<f32> {
		// CHECK: %[[VAL_2:.*]] = constant 999 : index
		// CHECK: %[[VAL_3:.*]] = constant 0 : index
		// CHECK: %[[VAL_4:.*]] = constant 1 : index
		// CHECK: %[[VAL_5:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>
		// CHECK: %[[VAL_6:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>
		// CHECK: %[[VAL_7:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>
		// CHECK: %[[VAL_8:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>
		// CHECK: %[[VAL_9:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>
		// CHECK: %[[VAL_10:.*]] = alloca(%[[VAL_2]]) : memref<?xindex>
		// CHECK: %[[VAL_11:.*]] = alloca(%[[VAL_2]]) : memref<?xf32>
		// CHECK: %[[VAL_12:.*]] = alloca() : memref<f32>
		// CHECK: %[[VAL_13:.*]] = load %[[VAL_5]]{{\[}}%[[VAL_3]]] : memref<?xindex>
		// CHECK: %[[VAL_14:.*]] = load %[[VAL_5]]{{\[}}%[[VAL_4]]] : memref<?xindex>
		// CHECK: scf.for %[[VAL_15:.*]] = %[[VAL_13]] to %[[VAL_14]] step %[[VAL_4]] {
		// CHECK: %[[VAL_16:.*]] = load %[[VAL_7]]{{\[}}%[[VAL_15]]] : memref<?xindex>
		// CHECK: %[[VAL_17:.*]] = addi %[[VAL_15]], %[[VAL_4]] : index
		// CHECK: %[[VAL_18:.*]] = load %[[VAL_7]]{{\[}}%[[VAL_17]]] : memref<?xindex>
		// CHECK: scf.for %[[VAL_19:.*]] = %[[VAL_16]] to %[[VAL_18]] step %[[VAL_4]] {
		// CHECK: %[[VAL_20:.*]] = load %[[VAL_9]]{{\[}}%[[VAL_19]]] : memref<?xindex>
		// CHECK: %[[VAL_21:.*]] = addi %[[VAL_19]], %[[VAL_4]] : index
		// CHECK: %[[VAL_22:.*]] = load %[[VAL_9]]{{\[}}%[[VAL_21]]] : memref<?xindex>
		// CHECK: scf.for %[[VAL_23:.*]] = %[[VAL_20]] to %[[VAL_22]] step %[[VAL_4]] {
		penpornkUnsubmitted Done Reply Inline Actions For summation, we actually can just loop through the nnz array (`VAL_11`) and forgo all the multi-level indexing. Is this what you mean when you said you prefer keeping all the i, j, k indices versus flattening? penpornk: For summation, we actually can just loop through the nnz array (`VAL_11`) and forgo all the…
		aartbikAuthorUnsubmitted Done Reply Inline Actions Yes! Making this one flattened loop will be indeed one of the future optimizations (since it vectorizes so much better too). aartbik: Yes! Making this one flattened loop will be indeed one of the future optimizations (since it…
		// CHECK: %[[VAL_24:.*]] = load %[[VAL_12]][] : memref<f32>
		// CHECK: %[[VAL_25:.*]] = load %[[VAL_11]]{{\[}}%[[VAL_23]]] : memref<?xf32>
		// CHECK: %[[VAL_26:.*]] = addf %[[VAL_24]], %[[VAL_25]] : f32
		// CHECK: store %[[VAL_26]], %[[VAL_12]][] : memref<f32>
		// CHECK: }
		// CHECK: }
		// CHECK: }
		// CHECK: %[[VAL_27:.*]] = tensor_load %[[VAL_12]] : memref<f32>
		// CHECK: return %[[VAL_27]] : tensor<f32>
		// CHECK: }
		func @sum_reduction(%arga: tensor<10x20x30xf32>, %argx: tensor<f32>) -> tensor<f32> {
		%0 = linalg.generic #trait_sum_reduction
		ins(%arga : tensor<10x20x30xf32>)
		init(%argx : tensor<f32>) {
		^bb(%a : f32, %x : f32):
		%0 = addf %x, %a : f32
		linalg.yield %0: f32
		} -> tensor<f32>
		return %0 : tensor<f32>
		}

This is an archive of the discontinued LLVM Phabricator instance.

[mlir][sparse] refine optimization, add few more test casesClosedPublic

Details

Diff Detail

Event Timeline

Revision Contents

Diff 306806

mlir/lib/Dialect/Linalg/Transforms/Sparsification.cpp

mlir/test/Dialect/Linalg/sparse_1d.mlir

mlir/test/Dialect/Linalg/sparse_2d.mlir

mlir/test/Dialect/Linalg/sparse_3d.mlir

[mlir][sparse] refine optimization, add few more test cases
ClosedPublic