This is an archive of the discontinued LLVM Phabricator instance.

Fix cycle in selection DAG introduced by extractelement legalization
ClosedPublic

Authored by rob.lougher on Dec 8 2015, 4:32 AM.

Details

Summary

Compiling the following code with optimization will cause the compiler to fail with an assertion.

typedef float alpha __attribute__((__vector_size__(16)));

float foo(int& i, alpha& v) {
  v *= v;
  return v[i];
}

Overran sorted position:

t15: ch,glue = CopyToReg t22:1, Register:f32 %XMM0, t22
  t14: f32 = Register %XMM0
  t22: f32,ch = load<LD4[<unknown>]> t9, t21, undef:i64
    t21: i64 = add t24, t4
      t24: i64 = shl t17, Constant:i8<2>
        t17: i64,ch = load<LD4[%i](tbaa=<0x5eab5d8>), sext from i32> t22:1, t2, undef:i64
          t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
            t1: i64 = Register %vreg0
          t6: i64 = undef
        t23: i8 = Constant<2>
      t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
        t3: i64 = Register %vreg1
    t6: i64 = undef

Checking if this is due to cycles
Detected cycle in SelectionDAG
Offending node:

t22: f32,ch = load<LD4[<unknown>]> t9, t21, undef:i64
  t21: i64 = add t24, t4
    t24: i64 = shl t17, Constant:i8<2>
      t17: i64,ch = load<LD4[%i](tbaa=<0x5eab5d8>), sext from i32> t22:1, t2, undef:i64
        t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
          t1: i64 = Register %vreg0
        t6: i64 = undef
      t23: i8 = Constant<2>
    t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
      t3: i64 = Register %vreg1
  t6: i64 = undef

This is the IR for the test case:

%1 = load <4 x float>, <4 x float>* %v, align 16
%mul = fmul <4 x float> %1, %1
store <4 x float> %mul, <4 x float>* %v, align 16
%2 = load i32, i32* %i, align 4
%vecext = extractelement <4 x float> %mul, i32 %2
ret float %vecext

During selection DAG legalization, the extractelement is replaced with a load instruction. To do this, a temporary store to the stack is used unless an existing store is found that can be re-used. In this case, the store is found.

After creating the load the code replaces the chain going out of the store with the one going out of the new load. The code does this to ensure that any stores that must take place after the store happens after the load, else the value might be overwritten before it is loaded.

The problem is, if the extractelement index is dependent on the store replacing the chain will introduce a cycle in the selection DAG (the load uses the index, and by replacing the chain we will make the index dependent on the load).

This patch is a simple fix for the problem, and adds an additional dependency test when searching for a store. This is conservative as we may end up creating an unnecessary extra store to the stack if another store cannot be found (as in this case). However, the logic required to calculate the correct dependency order is complex (see below), and the situation is not expected to occur very often.

Opinions on the patch are welcome. If the view is that the conservative approach is not appropriate I can try and implement the logic below and submit that instead.

Non-conservative fix

The change that introduced the problem is r231721 which added the code to update the dependency chain. As already stated, previously there was no dependency between the new load and any other store to the same location, meaning the load could occur after the store, getting a wrong value.

The problem is, the change always replaces the chain irrespective of whether there are any other stores. In the testcase above we have no additional stores and so the dependency chain does not need to be changed. However, it isn't as easy as simply scanning the uses of the re-used store and only updating the stores as we may have a load which is followed in the chain by a store. This is demonstrated by the testcase for r231721 which has multiple extractelements. As soon as an extractelement is rewritten into a load, the re-used store becomes followed in the chain by a load and then a store (all 4 extractelements end up chained between the original stores).

So we need to traverse the dependency chain looking for stores, and only replace the chain if a store occurs. However, this is also not sufficient. We can modify the testcase above:

typedef float alpha __attribute__((__vector_size__(16)));

float foo(int& i, alpha& v) {
  v *= v;
  return v[i++];
}

To create the following selection DAG:

Initial selection DAG: BB#0 '_Z3fooRiRDv4_f:entry'
SelectionDAG has 20 nodes:
  t0: ch = EntryToken
  t2: i64,ch = CopyFromReg t0, Register:i64 %vreg0
  t4: i64,ch = CopyFromReg t0, Register:i64 %vreg1
  t5: i64 = Constant<0>
  t7: v4f32,ch = load<LD16[%v](tbaa=<0x5990268>)> t0, t4, undef:i64
  t8: v4f32 = fmul t7, t7
    t9: ch = store<ST16[%v](tbaa=<0x5990268>)> t7:1, t8, t4, undef:i64
  t10: i32,ch = load<LD4[%i](tbaa=<0x59926b8>)> t9, t2, undef:i64
      t12: i32 = add t10, Constant:i32<1>
    t13: ch = store<ST4[%i](tbaa=<0x59926b8>)> t10:1, t12, t2, undef:i64
      t14: i64 = sign_extend t10
    t15: f32 = extract_vector_elt t8, t14
  t18: ch,glue = CopyToReg t13, Register:f32 %XMM0, t15
  t19: ch = X86ISD::RET_FLAG t18, TargetConstant:i16<0>, Register:f32 %XMM0, t18:1

The store (t9) that will be re-used for the extract_vector_elt is now followed in the dependency chain by a load and a store (t9->t10->t13). By the previous logic we need to replace the chain going out of the store. However, this will get the cycle in the DAG as in the original testcase, so we haven't solved the problem.

In this case we can change the dependency chain and insert the new load between t10 and t13. The new load will occur before the store (t13) maintaining the store ordering but after the load so avoiding the cycle in the DAG.

The problem is, I can see that it is theoretically possible to have multiple chains emanating from the re-used store all of which have a problematic load/store. We can't fix all of these chains by inserting the new load between the load and the store.

Diff Detail

Event Timeline

rob.lougher updated this revision to Diff 42160.Dec 8 2015, 4:32 AM
rob.lougher retitled this revision from to Fix cycle in selection DAG introduced by extractelement legalization.
rob.lougher updated this object.
rob.lougher added reviewers: hfinkel, ab.
rob.lougher updated this object.
rob.lougher added a subscriber: llvm-commits.
rob.lougher updated this object.Dec 8 2015, 4:35 AM
ab edited edge metadata.Dec 8 2015, 10:56 AM

The fix seems safe to me, though I'm a little worried about hasPredecessor().

lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1488

How about using hasPredecessorHelper?

Thanks for the review! Yes, I can use hasPredecessorHelper, which should make it less expensive when there is more than one store to check. I'll post an updated patch...

rob.lougher updated this revision to Diff 42281.Dec 9 2015, 2:48 AM
rob.lougher edited edge metadata.

Updated patch to use hasPredecessorHelper.

rob.lougher marked an inline comment as done.Dec 9 2015, 2:52 AM
rob.lougher added inline comments.
lib/CodeGen/SelectionDAG/LegalizeDAG.cpp
1488

Done

hfinkel accepted this revision.Dec 9 2015, 5:32 AM
hfinkel edited edge metadata.

LGTM.

I suspect that we already have the "real" solution to this problem implemented: When use of AA is enabled by the backend, then we'll have found a better chain for the load of the index and the situation will not occur.

This revision is now accepted and ready to land.Dec 9 2015, 5:32 AM
This revision was automatically updated to reflect the committed changes.