This is an archive of the discontinued LLVM Phabricator instance.

Redo store splitting in CodeGenPrepare
ClosedPublic

Authored by wmi on Oct 24 2016, 10:10 AM.

Details

Summary

This is a succeeding patch for https://reviews.llvm.org/D22840 to address the issue when a value to be merged into an int64 pair is in a different BB. The issue was originally described here: http://lists.llvm.org/pipermail/llvm-commits/Week-of-Mon-20160912/390582.html (Sorry to delay the patch for a long time, because I took a long vacation).

The fix is to redo the store splitting in CodeGenPrepare, so we can match the pattern across multiple BBs or move some instructions into the same BB.

The patch changing the target query interface was put separately here: https://reviews.llvm.org/D24707

Diff Detail

Repository
rL LLVM

Event Timeline

wmi updated this revision to Diff 75604.Oct 24 2016, 10:10 AM
wmi retitled this revision from to Redo store splitting in CodeGenPrepare.
wmi updated this object.
wmi added a reviewer: chandlerc.
wmi set the repository for this revision to rL LLVM.
wmi added subscribers: llvm-commits, davidxl, majnemer, arsenm.
chandlerc added inline comments.Nov 22 2016, 11:56 AM
lib/CodeGen/CodeGenPrepare.cpp
5218–5233 ↗(On Diff #75604)

Have you looked at doing this with the IR pattern match library? (I should have mentioned this earlier, sorry.) Check out the PatternMatch.h header.

Something like:

unsigned HalfValBitSize = ...;
Value *Lo, *Hi;
if (match(SI.getValueOperand(),
          m_c_Or(m_OneUse(m_Shl(m_ZExt(m_Value(Hi)), m_SpecificInt(HalfValBitSize))),
                 m_ZExt(m_Value(Lo)))))

I forget the exact syntax for matching zero extends, but I think you can capture most of this in a very small number of lines of code.

5219 ↗(On Diff #75604)

Why not go ahead and handle cases within a BB?

wmi added inline comments.Nov 23 2016, 1:52 PM
lib/CodeGen/CodeGenPrepare.cpp
5218–5233 ↗(On Diff #75604)

I rewrite it using pattern match library. It is really awesome. The code becomes much simpler.

5219 ↗(On Diff #75604)

I forgot the exact reason I added it. I remove the restriction and add a testcase for it.

wmi updated this revision to Diff 79146.Nov 23 2016, 1:52 PM

Address Chandler's comments.

chandlerc added inline comments.Nov 23 2016, 2:07 PM
lib/CodeGen/CodeGenPrepare.cpp
5311–5312 ↗(On Diff #79146)

No need to initialize these. Leaving them uninitialized makes it easier for MSan to find bugs if we try to use them despite the match failing. (Or a bug in the match that fails to set them.)

5342–5350 ↗(On Diff #79146)

What about just walking LValue and HValue back across any bitcast instructions? That should make it easier to get the type for the cost query, and then make this simpler as you can directly store the pre-bitcast value.

I also wouldn't worry about how many uses of the bitcast there are either, as bitcasts are expected to be free. As one example, if there are 3 uses of the bitcast, all to do merged stores, we should be willing to unmerge all three of them.

5352 ↗(On Diff #79146)

Just use the type of the input value rather than re-computing?

Also, you can probably just use a lambda to create the store and then call it for each of the inputs?

wmi added inline comments.Nov 23 2016, 3:03 PM
lib/CodeGen/CodeGenPrepare.cpp
5311–5312 ↗(On Diff #79146)

Ok, fixed.

5342–5350 ↗(On Diff #79146)

I am not very sure I understand correctly what you mean here, but I remove the hasOneUse limitation. Please take a look whether it looks the same as what is in your mind.

5352 ↗(On Diff #79146)

Input can be i16 and it is extended to i64 before the "shl + or" bitwise operations. The type of splitted store here is i32. So we cannot necessarily get the i32 from input type.

I created a lambda and called it for each input. It looks better.

wmi updated this revision to Diff 79157.Nov 23 2016, 3:08 PM

Address Chandler's comments (The second round).

chandlerc accepted this revision.Nov 23 2016, 3:14 PM
chandlerc edited edge metadata.

LGTM with a nit below.

I can send you a patch that tries to simplify the bitcast handling and see if you like that better. Seems much easier to explain with code.

lib/CodeGen/CodeGenPrepare.cpp
5347 ↗(On Diff #79157)

CreateSplittedStore -> CreateSplitStore

This revision is now accepted and ready to land.Nov 23 2016, 3:14 PM
majnemer added inline comments.Nov 25 2016, 9:07 PM
lib/CodeGen/CodeGenPrepare.cpp
5300–5301 ↗(On Diff #79157)

This looks wrong. Shouldn't it be getTypeStoreSizeInBits instead of getTypeSizeInBits?

5303 ↗(On Diff #79157)

Start with an uppercase letter.

5320–5322 ↗(On Diff #79157)

Ditto.

5327–5328 ↗(On Diff #79157)

auto *

wmi added inline comments.Nov 28 2016, 10:15 AM
lib/CodeGen/CodeGenPrepare.cpp
5300–5301 ↗(On Diff #79157)

They are the same here because the type of the store value must have power of 2 size if it is a merged store. But you remind me to add some testcases: @int31_float_pair, @int15_float_pair, @int7_float_pair added in the testcase.

5303 ↗(On Diff #79157)

Fixed.

5327–5328 ↗(On Diff #79157)

Fixed.

5347 ↗(On Diff #79157)

Fixed.

wmi updated this revision to Diff 79422.Nov 28 2016, 10:17 AM
wmi edited edge metadata.

Address David's comment.

majnemer added inline comments.Nov 28 2016, 11:36 AM
lib/CodeGen/CodeGenPrepare.cpp
5300–5301 ↗(On Diff #79157)

But aren't i1, i2 and i4 powers of two?

If we stored an i1, HalfValBitSize would be zero which sounds problematic.

wmi added inline comments.Nov 28 2016, 2:05 PM
lib/CodeGen/CodeGenPrepare.cpp
5300–5301 ↗(On Diff #79157)

The store val will at least be i2 because it is a merged val from two smaller vals.

If the store val is an i2 which are combined from an {i1, i1} pair, and we use getTypeStoreSizeInBits to compute the HalfValBitSize, the HalfValBitSize will be 4. It means the split store size will be 4 bits. It is not what we expect. We expect the type of split store is i1.

I cannot add add an i1_i1_pair test because now the target query will return false for int pair. But I have verified that the i1_i1_pair test worked correctly to use getTypeSizeInBits by disabling the target query temporarily.

chandlerc added inline comments.Nov 28 2016, 2:13 PM
lib/CodeGen/CodeGenPrepare.cpp
5300–5301 ↗(On Diff #79157)

I don't think storing i1s makes any sense here.

I think you should add a check that the type store size == the type size both before and after splitting and not split unless that is satisfied, regardless of what the target says.

And to make this easier to test, I suggest adding a flag that forces us to split everything we can split. Otherwise covering interesting inputs is too target dependent.

majnemer added inline comments.Nov 28 2016, 3:47 PM
lib/CodeGen/CodeGenPrepare.cpp
5300–5301 ↗(On Diff #79157)

I'm still not sure why we would want to use TypeSizeInBits instead of the store size...
What if the type is an i63? We would actually store 8 bytes...

wmi updated this revision to Diff 79471.Nov 28 2016, 3:49 PM

Address Chandler and David's comments.

  • Add an option to force store splitting.
  • Add int pair related tests
  • early exit if getTypeStoreSizeInBits and getTypeSizeInBits return different values for store type before and after split
chandlerc added inline comments.Nov 28 2016, 4:35 PM
lib/CodeGen/CodeGenPrepare.cpp
5300–5301 ↗(On Diff #79157)

I think it is at least reasonable to start with type size == store size. Among other things, when that isn't true, the shifting and extension logic will be at least a little more complicated.

Still, Wei, this might make sense as a follow-up patch to work on to extend this to handle more complicated merged stores. As a potentially useful set of examples you could look at the generated code for:

struct S {
  unsigned x : 27;
};

std::pair<S, float> g1();
std::pair<float, S> g2();

void f(std::pair<float, unsigned> &result) {
  auto p1 = g1();
  auto p2 = g2();
  result.first = p1.second + p2.first;
  result.second = p1.first + p2.second;
}

Or something similar... But I'm happy to do this as a follow-up generalization now that the correctness issues are addressed.

That seem reasonable David?

test/CodeGen/X86/split-store.ll
100–119 ↗(On Diff #79471)

I would also include more negative tests:

  1. non-symmetric merges
  2. non-power-of-two merged store sizes where type size == store size (i24, i48, etc)
  3. non-power-of-two merged store sizes where type size != store size (i14)
wmi updated this revision to Diff 79493.Nov 28 2016, 6:30 PM

Add tests suggested by Chandler.

  1. non-symmetric merges: tests: int31_int17_pair (splitted), int7_int3_pair (splitted).
  2. non-power-of-two merged store sizes where type size == store size (i24, i48, etc): tests: int24_int24_pair (splitted), int12_int12_pair (not splitted)
  3. non-power-of-two merged store sizes where type size != store size (i14): test: int7_int7_pair (not splitted)

LGTM

Sorry I lost track of this, please feel free to ping more aggressively when not getting reviews! =D

I'm still interested in trying to write this in a way that is a bit more general, but I think it is fine to do that in follow-up patches.

As for how to get test cases, I would take the idea behind pseudo-code like I showed and write the LLVM IR by hand to trigger that case.

This revision was automatically updated to reflect the committed changes.