AMDGPU would like to have MVTs for v3i32, v3f32, v5i32, v5f32. This
commit does not add them, but makes preparatory changes:
- Fixed assumptions of power-of-2 vector type in kernel arg handling, and added v5 kernel arg tests.
- Added v5 tests for cost analysis.
Some of this patch is from Matt Arsenault, also of AMD.