Starting from GCN 2nd generation, ISA supports ds_read_b128 on top of ds_read_b64. This patch supports ds_read_b128 instruction pattern and generation of this instruction.
In the vectorizer, this patch also widen the vector length so that vectorizer generates 128 bit loads for local address-space which gets translated to ds_read_b128.
It might be OK to say 128 anyway. You could still do adjacent ds_read2_b64 even when not using ds_read_b128. I don't think we try to do the same trick we do with 4-byte aligned 8 byte reads for the 64-bit equivalent, but you might want to look into that.
Anything you change here would also equally apply to REGION_ADDRESS