Currently, Clang does not generate individual stores for update to its elements. For code below:
typedef float v4sf __attribute__ ((vector_size(16))); void foo(v4sf *a) { (*a)[0] = 1; (*a)[3] = 2; }
LLVM generates a shuffle instr for it, even if there's only one element updated. But GCC will generate individual stores (at least on PowerPC).
Also, if we have a chain of shufflevector/insertelement instrs, we can go through it, track status of each element and find which updated, finally replace original vector store into multiple element stores. This patch will do it.
This optimization happens at DAGCombiner, since each platform can easily set rules about turning it own in own version of hook method. Steps of the optimization are:
- Start at a vector store, go up through its value operand, until we find a load.
- In path from store to the load, we only accept insert/shuffle as operands.
- Track value modification from the load the store. Quit if we need to extract from other vectors.
- Generate store of elements changed in the path, to replace original vector store.
A target-related method isCheapToSplitStore is created. So only PowerPC platform turns the optimization on now.
Discussion: http://lists.llvm.org/pipermail/llvm-dev/2019-September/135432.html http://lists.llvm.org/pipermail/llvm-dev/2019-October/135638.html
Alignment seems strange. Please use clang-format