This is an archive of the discontinued LLVM Phabricator instance.

[SelectionDAG] [ARM CodeGen] Fix chain information of LowerMUL
ClosedPublic

Authored by huihuiz on Apr 4 2017, 12:15 PM.

Details

Summary

In LowerMUL, the chain information is not preserved for the new created Load SDNode.

For example, if a Store alias with one of the operand of Mul. The Load for that operand need to be scheduled before the Store. The dependence is recorded in the chain of Store, in TokenFactor. However, when lowering MUL, the SDNodes for the new Loads for VMULL are not updated in the TokenFactor for the Store. Thus the chain is not preserved for the lowered VMULL.

Diff Detail

Repository
rL LLVM

Event Timeline

huihuiz created this revision.Apr 4 2017, 12:15 PM
jmolloy requested changes to this revision.Apr 5 2017, 12:52 AM
jmolloy added a subscriber: jmolloy.

Hi,

Thanks for this fix. I'm not 100% sure that all of it is needed - comments inline. Also, I particularly like the thoroughness of the commenting in your testcases. Much appreciated! :)

James

lib/Target/ARM/ARMISelLowering.cpp
7328

This looks correct to me and looks like all you should need.

7329

From here below looks unnecessary and wrong.

It looks unnecessary because it's planting an extend of an extending load - which is surely pointless?

And it looks wrong because SkipLoadExtensionForVMULL has this comment:

// We need to create a zextload/sextload. We cannot just create a load
// followed by a zext/zext node because LowerMUL is also run during normal
// operation legalization where we can't create illegal types.
This revision now requires changes to proceed.Apr 5 2017, 12:52 AM
huihuiz added a comment.EditedApr 5 2017, 3:16 PM

Hey James,

  1. In SkipLoadExtensionForVMULL, the creation of zextload/sextload is skipped when the original Load is already 64 bit vector. In both test cases, the original Load is <4 x i16>, later on sext to <4 x i32>.

More clear in the debug output of "llc -mcpu=krait -debug-only=isel,dagcombine lowerMUL-newloal.ll", the original load "t11: v4i16,ch = load<LD8[%vector_ptr0]> t0, t8, undef:i32" and "t27: v4i32 = sign_extend t11" are combined to "t37: v4i32,ch = load<LD8[%vector_ptr0], sext from v4i16> t0, t8, undef:i32"

The Load returned by SkipLoadExtensionForVMULL has the original Load's MemoryVT "v4i16,ch = load<LD8[%vector_ptr0]> t0, t8, undef:i32", not the sext load needed. Because the original Load already has the right type 64bits.

  1. For the new created Load, both the chain information and value need to be updated. Especially for cases when there are multiple uses of the original Load.

See test case func2.
a[i] = b[i] + a[i] * c[i] + a[i];
There are two uses of a[i], one for "a[i]"*c[i], the other for b[i]+a[i]*c[i] + "a[i]", of the same load a[i]

When updating the chain information, in DAG.ReplaceAllUsesOfValueWith(SDValue(LD, 1), newLoad.getValue(1));
The chain of the first use of a[i] is preserved, the "TokenFactor oriLoad, .., .." is updated with TokenFactor newLoad, .., .."
However, the chain for the second use of a[i] is lost. Because after the update, the TokenFactor only record the chain of the new Load for the first use.
The chain for the second use of the original Load should also need to be preserved.

We preserve the chain information of multiple uses of the original Load, by updating both the chain and value information. Basically forcing multiple uses of
the original Load to use the same new Load created, sext/zext when necessary while updating the value.

jmolloy accepted this revision.Apr 6 2017, 2:08 AM

Hi,

It took me a while to grok this, so here's some ascii-art for the record :)

We start with the general case that we have a load whose value may be used by multiple other values:

  |
  LD
 / | \
... ...

Now we introduce a new load, LD', that *may* replace LD for some or all of LD's uses, but we don't know \
or do that transform at this time.

     +
   /  \
  LD   LD'
 / | \
... ...

Transferring value uses (SDValue(LD, 0)) can be done incrementally by optimization passes. But currently\
we have two loads, one of which is unordered with respect to all other memory ops. There are two ways t\
o resolve this:

  1. Create a TokenFactor to link the chains together;

We'd end up with something like:

  +
 / \
LD  LD'
 \ /
  TF
 / | \
.. .. ..

This will work. It does however confuse the optimizer into keeping both loads around (I think it can't d\
etermine that one of the loads can be rewritten in terms of the other:

add     r3, r0, #16
vld1.16 {d17}, [r3:64]
vldr    d20, [r0, #16]
  1. Remove one load and write it in terms of the other

This is what this patch is doing:

   +
   |
  LD'
   |
  EXT   <-- LD = LD' + EXT
 / | \
.. .. ..

My concern with this, as it's planting an EXT, is whether the comment in SkipLoadExtensionForVMULL still\
applies:

// We need to create a zextload/sextload. We cannot just create a load
// followed by a zext/zext node because LowerMUL is also run during normal
// operation legalization where we can't create illegal types.

I think that the planted EXT is fine, because it is extending always from a legal type to another legal \
type.

Therefore, LGTM.

This revision is now accepted and ready to land.Apr 6 2017, 2:08 AM