This patch adds infrastructure for forward shape propagation to
LowerMatrixIntrinsics. It also updates the pass to make use of
the shape information to break up larger vector operations and to
eliminate unnecessary conversion operations between columnwise matrixes
and flattened vectors: if shape information is available for an
instruction, lower the operation to a set of instructions operating on
columns. For example, a store of a matrix is broken down into separate
stores for each column. For users that do not have shape
information (e.g. because they do not yet support shape information
aware lowering), we pack the result columns into a flat vector and
update those users.
It also adds shape aware lowering for the first non-intrinsic
instruction: vector stores.
Example:
For
%c = call <4 x double> @llvm.matrix.transpose(<4 x double> %a, i32 2, i32 2) store <4 x double> %c, <4 x double>* %Ptr
We generate the code below without shape propagation. Note %9 which
combines the columns of the transposed matrix into a flat vector.
%split = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 0, i32 1> %split1 = shufflevector <4 x double> %a, <4 x double> undef, <2 x i32> <i32 2, i32 3> %1 = extractelement <2 x double> %split, i64 0 %2 = insertelement <2 x double> undef, double %1, i64 0 %3 = extractelement <2 x double> %split1, i64 0 %4 = insertelement <2 x double> %2, double %3, i64 1 %5 = extractelement <2 x double> %split, i64 1 %6 = insertelement <2 x double> undef, double %5, i64 0 %7 = extractelement <2 x double> %split1, i64 1 %8 = insertelement <2 x double> %6, double %7, i64 1 %9 = shufflevector <2 x double> %4, <2 x double> %8, <4 x i32> <i32 0, i32 1, i32 2, i32 3> store <4 x double> %9, <4 x double>* %Ptr
With this patch, we propagate the 2x2 shape information from the
transpose to the store and we generate the code below. Note that we
store the columns directly and do not need an extra shuffle.
%9 = bitcast <4 x double>* %Ptr to double* %10 = bitcast double* %9 to <2 x double>* store <2 x double> %4, <2 x double>* %10, align 8 %11 = getelementptr double, double* %9, i32 2 %12 = bitcast double* %11 to <2 x double>* store <2 x double> %8, <2 x double>* %12, align 8
produced