This change adds memory space support to tensor.pad. (tensor.generate and tensor.from_elements do not support memory spaces yet.)
The memory space is inferred from the buffer of the source tensor.
Instead of lowering tensor.pad to tensor.generate + tensor.insert_slice, it is now lowered to bufferization.alloc_tensor (with the correct memory space) + scf.parallel + tensor.insert_slice.
Memory space support for the remaining two tensor ops is left for a later point, as this requires some more design discussions.
Depends On D136767