Fix semantic in the distribute integration test based on offline feedback. This exposed a bug in block distribution, we need to make sure the id is multiplied by the stride of the vector. Fix the transformation and unit test.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir | ||
---|---|---|
30–31 | Not in this CL, but probably during a later cleanup, I would rename the "distribution" part to something better. Loop distribution is typically reserved for for { for s1 -> s1 s2 for } s2 what is done here is more stripmining, blocking or tiling or chunking in 1-D, or something named like that. | |
42–43 | c32 because 2x32=64, right? I am not super convinced I find this intermediate step much easier to understand than generating the chunked loop right away, but I hope you convince me in the discourse discussion |
mlir/test/Dialect/Vector/vector-distribution.mlir | ||
---|---|---|
3 | Why removing the "-LABEL" here? |
Changes to preserve semantic in the integration test based on discourse:
https://llvm.discourse.group/t/vector-vector-distribution-large-vector-to-small-vector/1983/12
I changed the integration test to match what I described here: https://llvm.discourse.group/t/vector-vector-distribution-large-vector-to-small-vector/1983/12
If it makes sense to all of you I can do the rest of the changes.
mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir | ||
---|---|---|
30–31 | Makes sense. I can rename it in a future patch once we get more agreement on the design, | |
42–43 | Correct, right now the extract_map expects contiguous IDs. (%arg5 : 32 goes from 0 to 31). | |
mlir/test/Dialect/Vector/vector-distribution.mlir | ||
3 | That was done by mistake. I do need to change the CHECK-LABEL in the 3rd test otherwise the [[MAP]] variable gets reset after the CHECK-LABEL. |
Let's also update the ops' documentation to remove this section:
For instance, the following code: %a = vector.transfer_read %A[%c0]: memref<32xf32>, vector<32xf32> %b = vector.transfer_read %B[%c0]: memref<32xf32>, vector<32xf32> %c = addf %a, %b: vector<32xf32> vector.transfer_write %c, %C[%c0]: memref<32xf32>, vector<32xf32> can be rewritten to: %a = vector.transfer_read %A[%c0]: memref<32xf32>, vector<32xf32> %b = vector.transfer_read %B[%c0]: memref<32xf32>, vector<32xf32> %ea = vector.extract_map %a[%id : 32] : vector<32xf32> to vector<1xf32> %eb = vector.extract_map %b[%id : 32] : vector<32xf32> to vector<1xf32> %ec = addf %ea, %eb : vector<1xf32> %c = vector.insert_map %ec, %id, 32 : vector<1xf32> to vector<32xf32> vector.transfer_write %c, %C[%c0]: memref<32xf32>, vector<32xf32> Where %id can be an induction variable or an SPMD id going from 0 to 31. And then be rewritten to: %a = vector.transfer_read %A[%id]: memref<32xf32>, vector<1xf32> %b = vector.transfer_read %B[%id]: memref<32xf32>, vector<1xf32> %c = addf %a, %b: vector<1xf32> vector.transfer_write %c, %C[%id]: memref<32xf32>, vector<1xf32>
and instead focus on the actual op semantics, with an example e.g.
Examples: %idx0 = ... : index // dynamic computation producing the value 0 of index type %idx1 = ... : index // dynamic computation producing the value 1 of index type %0 = constant dense<0, 1, 2, 3>: vector<4xi32> %1 = vector.extract_map %0[%idx0 : 2] : vector<4xi32> to vector<2xi32> // extracts values [0, 1] %2 = vector.extract_map %0[%idx1 : 2] : vector<4xi32> to vector<2xi32> // extracts values [1, 2] ... (same for insert variant)
As the op semantics evolve, the examples will too.
Thanks @ThomasRaoux !
mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir | ||
---|---|---|
42–43 | As discussed on discourse, this is transient state internal to the test pass and should not be exposed. Let's please have the test do 2 things. Input IR: %a = vector.transfer_read %in1[%c0], %cf0: memref<?xf32>, vector<256xf32> %b = vector.transfer_read %in2[%c0], %cf0: memref<?xf32>, vector<256xf32> %acc = addf %a, %b: vector<256xf32> vector.transfer_write %acc, %out[%c0]: vector<256xf32>, memref<?xf32> Output IR: scf.for %arg5 = %c0 to %c256 step %c8 { %a = vector.transfer_read %in1[%arg5], %cf0: memref<?xf32>, vector<8xf32> %b = vector.transfer_read %in2[%arg5], %cf0: memref<?xf32>, vector<8xf32> %acc = addf %a, %b: vector<8xf32> vector.transfer_write %acc, %out[%arg5]: vector<8xf32>, memref<?xf32> } The test should also run with and without the application of the test pass and produce the same result. |
I re-based the patch and changed the test to avoid starting from an intermediate state. Please take another look.
mlir/integration_test/Dialect/Vector/CPU/test-vector-distribute.mlir | ||
---|---|---|
42–43 | Done. Starting from just the vector add and running with and without the transformation pass. |
Could we add an extra RUN command that just does mlir-opt %s -test-vector-to-forloop and checks the presence of the forloop+vectors ?