This patch adds infrastructure for forward shape propagation to

LowerMatrixIntrinsics. It also updates the pass to make use of

the shape information to break up larger vector operations and to

eliminate unnecessary conversion operations between columnwise matrixes

and flattened vectors: if shape information is available for an

instruction, lower the operation to a set of instructions operating on

columns. For example, a store of a matrix is broken down into separate

stores for each column. For users that do not have shape

information (e.g. because they do not yet support shape information

aware lowering), we pack the result columns into a flat vector and

update those users.

It also adds shape aware lowering for the first non-intrinsic

instruction: vector stores.

Example:

For

%c = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2) store <4 x double> %c, <4 x double>* %Ptr

We generate the code below without shape propagation. Note %9 which

combines the columns of the transposed matrix into a flat vector.

%split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1> %split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3> %1 = extractelement <2 x double> %split, i64 0 %2 = insertelement <2 x double> undef, double %1, i64 0 %3 = extractelement <2 x double> %split1, i64 0 %4 = insertelement <2 x double> %2, double %3, i64 1 %5 = extractelement <2 x double> %split, i64 1 %6 = insertelement <2 x double> undef, double %5, i64 0 %7 = extractelement <2 x double> %split1, i64 1 %8 = insertelement <2 x double> %6, double %7, i64 1 %9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3> store <4 x double> %9, <4 x double>* %Ptr

With this patch, we propagate the 2x2 shape information from the

transpose to the store and we generate the code below. Note that we

store the columns directly and do not need an extra shuffle.

%9 = bitcast <4 x double>* %Ptr to double* %10 = bitcast double* %9 to <2 x double>* store <2 x double> %4, <2 x double>* %10, align 8 %11 = getelementptr double, double* %9, i32 2 %12 = bitcast double* %11 to <2 x double>* store <2 x double> %8, <2 x double>* %12, align 8