VLST arguments are coerced to VLATs at the function boundary for
consistency with the VLAT ABI. They are then bitcast back to VLSTs in
the function prolog. Previously, this conversion is done through memory.
With the introduction of the llvm.vector.{insert,extract} intrinsic, we
can avoid going through memory here.
Depends on D92761
this is slightly confusing since the coercion done in TargetInfo is from fixed -> scalable so VLSTs are represented as scalable vectors in functions args/return, yet this is casting back to fixed in the function prolog using llvm.experimental.vector.extract like you mention in the commit message, could this comment clarify that?