Copy back the padded result to the original destination of the computation. This is important for bufferization, to ensure that the result of the computation does not suddenly materialize in a different buffer due to padding.
A bufferization.copy_tensor is inserted for every (unpadded) result. Such ops bufferize to memcpys, but they fold away, should the padding fold away.
Depends On: D153552