Currently, the transfer mask is materialized by generating the vector
comparison: [offset + 0, .., offset + length - 1] < [dim, .., dim]
A better alternative is to materialize the transfer mask by using the
operation: vector.create_mask (dim - offset), which will generate
simpler code and compose better with scalable vectors.