In this patch we are adding the support of copying a a memref.subview to the shared or private memory in GPU. The global to shared memory copy is adopted from codes implemented in IREE (https://github.com/iree-org/iree), but the private memory copy part has not been implemented in IREE. This patch enables transferring a subview from global->shared, global->private, and shared->private.
Our final aim is to provide a copy layout as an affine map to the transform.promote op to support transpose memory copy. This map is a permutation of the original affine index map. Although this has been implemented and user can copy data to arbitrary layout , this attempt is not included in this patch since we have still problem with linalg.generic operations to change their index map to the transformed index map. You can find more in following links (Initial attempt to support layout map in promote op in transform dialect) (Fix data transpose in shared memory)
Nit: we tend to prefer OpenCL-style terminology, so s/shared/workgroup (which is the actual name of the attribute). s/thread/workitem, s/block/workgroup.