The AVX512 gather/scatter instructions only support scales up to 8. If the scale is larger, perform an explicit shift operation instead.
I'm not familiar with the requirements for the instruction -- is a scale that does not match the scatter element width actually supported? The instruction looks sensible, but I don't know if this is actually legal. If not, the check could be change to compare with the element width.
Gather will signextend(index) to 64-bit first, then multiple the scale. When Index is 32-bit, it may occur overflow if we just multiply scale to index.
BTW, can we just bail out in getUniformBase when the scale is not supported by the target.