Hi All,
The below code was not being vectorized in vector shuffle in SLPVectorizer.
float fa[4],fb[4],fc[4];
void fn() {
  fc[0] = fb[0]+fa[0];
  fc[1] = fa[1]-fb[1];
  fc[2] = fa[2]+fb[2];
  fc[3] = fa[3]-fb[3];
}This was because we were to take the operands in the given order fb[0] and fa[1] are not consecutive access. But since '+' is commutative for both float and int for which we handle Shuffle Vector. Hence we can reorder the addition fb[0] + fa[0] -> fa[0] + fb[0] in which case buildTree_rec will be able to conclude it as a consecutive load and vectorize the same.
In this patch we check if we can reorder commutative operations in AltShuffle which can result in vectorization if yes we reorder the operands of the commutative operation.
Please let me know if this is good to commit.
Thanks and Regards
Karthik Bhat
I don't think reorderInputsAccordingToOpcode currently handle it. I.e. it can accidentally handle it in some cases, but it doesn't do that always. For example the following code doesn't get vectorized:
define void @foo() #0 { %1 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 0), align 4 %2 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 0), align 4 %3 = add nsw i32 %1, %2 store i32 %3, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 0), align 4 %4 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 1), align 4 %5 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 1), align 4 ; Please note that %4 and %5 are swapped in the following line: %6 = add nsw i32 %5, %4 store i32 %6, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 1), align 4 %7 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 2), align 4 %8 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 2), align 4 %9 = add nsw i32 %7, %8 store i32 %9, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 2), align 4 %10 = load i32* getelementptr inbounds ([1000 x i32]* @a, i32 0, i64 3), align 4 %11 = load i32* getelementptr inbounds ([1000 x i32]* @b, i32 0, i64 3), align 4 %12 = add nsw i32 %10, %11 store i32 %12, i32* getelementptr inbounds ([1000 x i32]* @c, i32 0, i64 3), align 4 ret void }It might make sense to handle such cases explicitly, like you do for altShuffles.