This is an archive of the discontinued LLVM Phabricator instance.

[CodeGen] Replace the outgoing chain when reusing stores for extractelt expansion.
ClosedPublic

Authored by ab on Mar 9 2015, 1:26 PM.

Details

Summary

This fixes a subtle issue that was introduced in r205153 [0].

When reusing a store for the extractelement expansion (to load directly from it, inserting of going through the stack), later stores to the same location might have overwritten the data we were expecting to extract from. To fix that, we need to explicitly replace the chain going out of the reused store, so that later stores also depend on the generated element-extracting loads.

Here's a small annotated testcase, convoluted as it may be (any simpler and we're able to realize this is just a vector shuffle):

typedef int32_t v4i32 __attribute__ ((vector_size (16)));

void test(int32_t *x, int32_t *y, int i) {
  v4i32 b = (*(v4i32*)y);
  x[i*4 + 1] = b[i*4 + 0];
  x[i*4 + 0] = b[i*4 + 1];
}

Let's assume i is dynamically always 0. Consider the case where x == y. The function then does something like the swap pattern:

b = *x;
x[1] = b[0];
x[0] = b[1];

The compiler used to generate:

movl    8(%ebp), %eax        ; eax <- x
movl    16(%ebp), %ecx       ; ecx <- i
movl    12(%ebp), %edx       ; edx <- y
movaps  (%edx), %xmm0        ; b <- *y

movaps  %xmm0, -40(%ebp)     ; tmp = b
shll    $4, %ecx             ; ecx <- ecx * 4;  ecx = i = 0
leal    -40(%ebp), %edx      ;
movl    (%ecx,%edx), %edx    ; edx <- tmp[0]
movl    %edx, 4(%eax,%ecx)   ; x[1] = tmp[0] = b[0]
movaps  %xmm0, -24(%ebp)     ; tmp2 = b
leal    -24(%ebp), %edx      ;
movl    4(%ecx,%edx), %edx   ; edx <- tmp2[1]
movl    %edx, (%eax,%ecx)    ; x[0] = tmp2[1] = b[1]

The new code however is sometimes problematic:

movl    8(%ebp), %eax        ; eax <- x
movl    16(%ebp), %ecx       ; ecx <- i
movl    12(%ebp), %edx       ; edx <- y
movaps  (%edx), %xmm0        ; b <- *y

movaps  %xmm0, -24(%ebp)     ; tmp = b
shll    $4, %ecx             ; ecx <- ecx * 4;  ecx = i = 0
leal    -24(%ebp), %edx      ;
movl    (%ecx,%edx), %esi    ; esi <- tmp[0]
movl    %esi, 4(%eax,%ecx)   ; 
movl    4(%ecx,%edx), %edx   ; edx <- tmp[0]
movl    %edx, (%eax,%ecx)    ; x[0] = tmp[0]

Broadcasting x[0].

[0] Quoting:

Make use of previously generated stores in SelectionDAGLegalize::ExpandExtractFromVectorThroughStack

When expanding EXTRACT_VECTOR_ELT and EXTRACT_SUBVECTOR using
SelectionDAGLegalize::ExpandExtractFromVectorThroughStack, we store the entire
vector and then load the piece we want. This is fine in isolation, but
generating a new store (and corresponding stack slot) for each extraction ends
up producing code of poor quality. When we scalarize a vector operation (using
SelectionDAG::UnrollVectorOp for example) we generate one EXTRACT_VECTOR_ELT
for each element in the vector. This used to generate one stored copy of the
vector for each element in the vector. Now we search the uses of the vector for
a suitable store before generating a new one, which results in much more
efficient scalarization code.

Diff Detail

Repository
rL LLVM

Event Timeline

ab updated this revision to Diff 21506.Mar 9 2015, 1:26 PM
ab retitled this revision from to [CodeGen] Replace the outgoing chain when reusing stores for extractelt expansion..
ab updated this object.
ab edited the test plan for this revision. (Show Details)
ab added a reviewer: hfinkel.
ab added a subscriber: Unknown Object (MLST).
hfinkel accepted this revision.Mar 9 2015, 2:40 PM
hfinkel edited edge metadata.

LGTM (with simplification mentioned below), thanks for catching this!

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1460 ↗(On Diff #21506)

This code can be:

SmallVector<SDValue, 6> NewLoadOperands(Node->op_begin(), Node->op_end());
NewLoadOperands[0] = Ch;
This revision is now accepted and ready to land.Mar 9 2015, 2:40 PM
This revision was automatically updated to reflect the committed changes.