We sometimes see code like this:

Case 1:

%gep = getelementptr i32, i32* %a, <2 x i64> %splat %ext = extractelement <2 x i32*> %gep, i32 0

or this:

Case 2:

%gep = getelementptr i32, <4 x i32*> %a, i64 1 %ext = extractelement <4 x i32*> %gep, i32 0

where there is only one use of the GEP. In such cases it makes

sense to fold the two together such that we create a scalar GEP:

Case 1:

%ext = extractelement <2 x i64> %splat, i32 0 %gep = getelementptr i32, i32* %a, i64 %ext

Case 2:

%ext = extractelement <2 x i32*> %a, i32 0 %gep = getelementptr i32, i32* %ext, i64 1

This may create further folding opportunities as a result, i.e.

the extract of a splat vector can be completely eliminated. Also,

even for the general case where the vector operand is not a splat

it seems beneficial to create a scalar GEP and extract the scalar

element from the operand. Therefore, in this patch I've assumed

that a scalar GEP is always preferrable to a vector GEP and have

added code to unconditionally fold the extract + GEP.

I haven't added folds for the case when we have both a vector of

pointers and a vector of indices, since this would require

generating an additional extractelement operation.

Tests have been added here:

Transforms/InstCombine/gep-vector-indices.ll