This is an archive of the discontinued LLVM Phabricator instance.

[mlir][vector] Improve vector distribute integration test and fix block distribution
ClosedPublic

Authored by ThomasRaoux on Oct 12 2020, 8:59 PM.

Details

Summary

Fix semantic in the distribute integration test based on offline feedback. This exposed a bug in block distribution, we need to make sure the id is multiplied by the stride of the vector. Fix the transformation and unit test.

Diff Detail

Event Timeline

ThomasRaoux created this revision.Oct 12 2020, 8:59 PM
Herald added a reviewer: aartbik. · View Herald Transcript
Herald added a project: Restricted Project. · View Herald Transcript
ThomasRaoux requested review of this revision.Oct 12 2020, 8:59 PM
ThomasRaoux retitled this revision from [mlir][vector] Improve vector distribute integration test to [mlir][vector] Improve vector distribute integration test and fix block distribution.
ThomasRaoux edited the summary of this revision. (Show Details)
aartbik added inline comments.Oct 13 2020, 3:16 PM
mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir
22–24

Not in this CL, but probably during a later cleanup, I would rename the "distribution" part to something better. Loop distribution is typically reserved for

for { for

s1            ->        s1
s2                   for

} s2

what is done here is more stripmining, blocking or tiling or chunking in 1-D, or something named like that.

35–40

c32 because 2x32=64, right?

I am not super convinced I find this intermediate step much easier to understand than generating the chunked loop right away, but I hope you convince me in the discourse discussion

mehdi_amini added inline comments.Oct 14 2020, 10:20 AM
mlir/test/Dialect/Vector/vector-distribution.mlir
3 ↗(On Diff #297765)

Why removing the "-LABEL" here?

I changed the integration test to match what I described here: https://llvm.discourse.group/t/vector-vector-distribution-large-vector-to-small-vector/1983/12

If it makes sense to all of you I can do the rest of the changes.

mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir
22–24

Makes sense. I can rename it in a future patch once we get more agreement on the design,

35–40

Correct, right now the extract_map expects contiguous IDs. (%arg5 : 32 goes from 0 to 31).
About iterative vs all at once transformation let's keep talking on Discourse :)

mlir/test/Dialect/Vector/vector-distribution.mlir
3 ↗(On Diff #297765)

That was done by mistake. I do need to change the CHECK-LABEL in the 3rd test otherwise the [[MAP]] variable gets reset after the CHECK-LABEL.

Let's also update the ops' documentation to remove this section:

For instance, the following code:

   %a = vector.transfer_read %A[%c0]: memref<32xf32>, vector<32xf32>
   %b = vector.transfer_read %B[%c0]: memref<32xf32>, vector<32xf32>
   %c = addf %a, %b: vector<32xf32>
   vector.transfer_write %c, %C[%c0]: memref<32xf32>, vector<32xf32>

   can be rewritten to:
   %a = vector.transfer_read %A[%c0]: memref<32xf32>, vector<32xf32>
   %b = vector.transfer_read %B[%c0]: memref<32xf32>, vector<32xf32>
   %ea = vector.extract_map %a[%id : 32] : vector<32xf32> to vector<1xf32>
   %eb = vector.extract_map %b[%id : 32] : vector<32xf32> to vector<1xf32>
   %ec = addf %ea, %eb : vector<1xf32>
   %c = vector.insert_map %ec, %id, 32 : vector<1xf32> to vector<32xf32>
   vector.transfer_write %c, %C[%c0]: memref<32xf32>, vector<32xf32>

   Where %id can be an induction variable or an SPMD id going from 0 to 31.

   And then be rewritten to:
   %a = vector.transfer_read %A[%id]: memref<32xf32>, vector<1xf32>
   %b = vector.transfer_read %B[%id]: memref<32xf32>, vector<1xf32>
   %c = addf %a, %b: vector<1xf32>
   vector.transfer_write %c, %C[%id]: memref<32xf32>, vector<1xf32>

and instead focus on the actual op semantics, with an example e.g.

Examples:

%idx0 = ... : index // dynamic computation producing the value 0 of index type
%idx1 = ... : index // dynamic computation producing the value 1 of index type
%0 = constant dense<0, 1, 2, 3>: vector<4xi32>
%1 = vector.extract_map %0[%idx0 : 2] : vector<4xi32> to vector<2xi32> // extracts values [0, 1]
%2 = vector.extract_map %0[%idx1 : 2] : vector<4xi32> to vector<2xi32> // extracts values [1, 2]
... (same for insert variant)

As the op semantics evolve, the examples will too.

Thanks @ThomasRaoux !

mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir
35–40

As discussed on discourse, this is transient state internal to the test pass and should not be exposed.

Let's please have the test do 2 things.

Input IR:

%a = vector.transfer_read %in1[%c0], %cf0: memref<?xf32>, vector<256xf32>
%b = vector.transfer_read %in2[%c0], %cf0: memref<?xf32>, vector<256xf32>
%acc = addf %a, %b: vector<256xf32>
vector.transfer_write %acc, %out[%c0]: vector<256xf32>, memref<?xf32>

Output IR:

scf.for %arg5 = %c0 to %c256 step %c8 {
  %a = vector.transfer_read %in1[%arg5], %cf0: memref<?xf32>, vector<8xf32>
  %b = vector.transfer_read %in2[%arg5], %cf0: memref<?xf32>, vector<8xf32>
  %acc = addf %a, %b: vector<8xf32>
  vector.transfer_write %acc, %out[%arg5]: vector<8xf32>, memref<?xf32>
}

The test should also run with and without the application of the test pass and produce the same result.

Update integration test to start from a large vector without loop.

I re-based the patch and changed the test to avoid starting from an intermediate state. Please take another look.

mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir
35–40

Done. Starting from just the vector add and running with and without the transformation pass.

mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir
1–2

Could we add an extra RUN command that just does mlir-opt %s -test-vector-to-forloop and checks the presence of the forloop+vectors ?

6

blank line to delimit commands ?

This revision is now accepted and ready to land.Oct 29 2020, 12:36 PM

Add extra check to the integration test to make sure the transformation happened.

ThomasRaoux marked 2 inline comments as done.