This patch makes the transform.structured.pad op return also a handle
to the copy op that it inserts. This allows to continue transformation
on that op, such as mapping it to a GPU thread.
The patch was mainly authored by @springerm as part of the WIP patch
https://reviews.llvm.org/D156371, which also has an example usage of
this change.