Considering the case that generic + pack (with outer_dim_perms), the
truth is that it is equipvelent to generic + pack + transpose. There are
two steps to bubble up the pack op accross the generic op.
Step 1. swap generic + pack -> pack + generic.
In this step, we can bind the packing information to dimensions of
iteration domain. With the information, we can pack the operands with
corresponding data tile sizes; the packed inner dimensions will be
appended to the indexing_maps. Note that the outer dimensions of
indexing maps are not changed at all.
Step 2. Fold the transpose into generic op.
The step two is just updating the indexing map, so we do not have to
handle outer_dim_perms anymore.
There could be step 3 to extract the transpose op out (i.e., generic ->
transpose + generic), then we can fold the transpose into the pack op.
This step is not done in the revision.
Co-authored-by: Lorenzo Chelini <l.chelini@icloud.com>
I don't think we need this, or at least for now. It is the number of dims in the output map.