According to the comments in D8820 (Teach Loop Vectorizer about interleaved data accesses), i split that patch and this is the first patch to add support for the new intrinsics:
<4 x double> @llvm.indexed.load.v4f64 (double* <ptr>, <4 x i32> <index>, i32 <alignment>) void @llvm.indexed.store.v4f64 (<4 x double> <value>, double* <ptr>, <4 x i32> <index>, i32 <alignment>)
Such intrinsics can be used as interleaved load/store, strided load/store, etc.
I just a bit worry about the name of "indexed". Actually there is already indexed load/store name used for load/store with indexed memory mode (the pre-incremental, post-incremental, pre-dec...). There is also masked load/store for prediction load/store. I can't find a better name for load/store with indices. If you think this name is confusing and there is a better name, I can change it.
The implementation is like the masked load/store. This patch mainly about:
(1) Add two new intrinsics and modify the LangRef.rst (2) Add code generator for the new intrinsics. Add AArch64 backend codegen for the interleaved load/store, which is a subset of the indexed load/store. (3) Teach the CodeGenPrepare to scalarize unsupported unsupported indexed load/store.
There is no code in the Legalization phase, as the AArch64 backend can not support other indexed load/store except interleaved load/store. Even if I add such code, I can not test. Anyway, the CodeGenPrepare can handle the unsupported cases.
There are TODOs in the CodeGenPrepare, some indexed load/store can be transfered into "a VectorLoad + a SuffleVector" or "a ShuffleVector + a VectorStore".