Fix semantic in the distribute integration test based on offline feedback. This exposed a bug in block distribution, we need to make sure the id is multiplied by the stride of the vector. Fix the transformation and unit test.
Details
Diff Detail
Event Timeline
mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir | ||
---|---|---|
22 | Not in this CL, but probably during a later cleanup, I would rename the "distribution" part to something better. Loop distribution is typically reserved for for { for s1 -> s1 s2 for } s2 what is done here is more stripmining, blocking or tiling or chunking in 1-D, or something named like that. | |
35 | c32 because 2x32=64, right? I am not super convinced I find this intermediate step much easier to understand than generating the chunked loop right away, but I hope you convince me in the discourse discussion |
mlir/test/Dialect/Vector/vector-distribution.mlir | ||
---|---|---|
3 | Why removing the "-LABEL" here? |
Changes to preserve semantic in the integration test based on discourse:
https://llvm.discourse.group/t/vector-vector-distribution-large-vector-to-small-vector/1983/12
I changed the integration test to match what I described here: https://llvm.discourse.group/t/vector-vector-distribution-large-vector-to-small-vector/1983/12
If it makes sense to all of you I can do the rest of the changes.
mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir | ||
---|---|---|
22 | Makes sense. I can rename it in a future patch once we get more agreement on the design, | |
35 | Correct, right now the extract_map expects contiguous IDs. (%arg5 : 32 goes from 0 to 31). | |
mlir/test/Dialect/Vector/vector-distribution.mlir | ||
3 | That was done by mistake. I do need to change the CHECK-LABEL in the 3rd test otherwise the [[MAP]] variable gets reset after the CHECK-LABEL. |
Let's also update the ops' documentation to remove this section:
For instance, the following code: %a = vector.transfer_read %A[%c0]: memref<32xf32>, vector<32xf32> %b = vector.transfer_read %B[%c0]: memref<32xf32>, vector<32xf32> %c = addf %a, %b: vector<32xf32> vector.transfer_write %c, %C[%c0]: memref<32xf32>, vector<32xf32> can be rewritten to: %a = vector.transfer_read %A[%c0]: memref<32xf32>, vector<32xf32> %b = vector.transfer_read %B[%c0]: memref<32xf32>, vector<32xf32> %ea = vector.extract_map %a[%id : 32] : vector<32xf32> to vector<1xf32> %eb = vector.extract_map %b[%id : 32] : vector<32xf32> to vector<1xf32> %ec = addf %ea, %eb : vector<1xf32> %c = vector.insert_map %ec, %id, 32 : vector<1xf32> to vector<32xf32> vector.transfer_write %c, %C[%c0]: memref<32xf32>, vector<32xf32> Where %id can be an induction variable or an SPMD id going from 0 to 31. And then be rewritten to: %a = vector.transfer_read %A[%id]: memref<32xf32>, vector<1xf32> %b = vector.transfer_read %B[%id]: memref<32xf32>, vector<1xf32> %c = addf %a, %b: vector<1xf32> vector.transfer_write %c, %C[%id]: memref<32xf32>, vector<1xf32>
and instead focus on the actual op semantics, with an example e.g.
Examples: %idx0 = ... : index // dynamic computation producing the value 0 of index type %idx1 = ... : index // dynamic computation producing the value 1 of index type %0 = constant dense<0, 1, 2, 3>: vector<4xi32> %1 = vector.extract_map %0[%idx0 : 2] : vector<4xi32> to vector<2xi32> // extracts values [0, 1] %2 = vector.extract_map %0[%idx1 : 2] : vector<4xi32> to vector<2xi32> // extracts values [1, 2] ... (same for insert variant)
As the op semantics evolve, the examples will too.
Thanks @ThomasRaoux !
mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir | ||
---|---|---|
35 | As discussed on discourse, this is transient state internal to the test pass and should not be exposed. Let's please have the test do 2 things. Input IR: %a = vector.transfer_read %in1[%c0], %cf0: memref<?xf32>, vector<256xf32> %b = vector.transfer_read %in2[%c0], %cf0: memref<?xf32>, vector<256xf32> %acc = addf %a, %b: vector<256xf32> vector.transfer_write %acc, %out[%c0]: vector<256xf32>, memref<?xf32> Output IR: scf.for %arg5 = %c0 to %c256 step %c8 { %a = vector.transfer_read %in1[%arg5], %cf0: memref<?xf32>, vector<8xf32> %b = vector.transfer_read %in2[%arg5], %cf0: memref<?xf32>, vector<8xf32> %acc = addf %a, %b: vector<8xf32> vector.transfer_write %acc, %out[%arg5]: vector<8xf32>, memref<?xf32> } The test should also run with and without the application of the test pass and produce the same result. |
I re-based the patch and changed the test to avoid starting from an intermediate state. Please take another look.
mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir | ||
---|---|---|
35 | Done. Starting from just the vector add and running with and without the transformation pass. |
Could we add an extra RUN command that just does mlir-opt %s -test-vector-to-forloop and checks the presence of the forloop+vectors ?