We probably need to move where intrinsics are lowered to copies to
make this useful.
Why not put it in generic code to start with? I don't understand why the concern about conflicting alignment would prevent that. I mean, are you more confident about how to handle conflicting alignments for AMDGPU than you are for all other targets?
I know in this case, even if the call site specified a lower alignment, you're always getting at least 4. For other targets, I don't know if a lower call site alignment would need to be respected. Ultimately we need an assert_align pseudo to track the call site attribute