This is an archive of the discontinued LLVM Phabricator instance.

A fix for bug33826
ClosedPublic

Authored by Farhana on Jul 19 2017, 11:47 AM.

Details

Summary

X86Interleave should not try handle cases where the wide-load size is less than the factor*num_elements.

Diff Detail

Repository
rL LLVM

Event Timeline

Farhana created this revision.Jul 19 2017, 11:47 AM
Farhana added a reviewer: DavidKreitzer.
Farhana added subscribers: llvm-commits, RKSimon.
DavidKreitzer added inline comments.Jul 19 2017, 12:30 PM
lib/Target/X86/X86InterleavedAccess.cpp
108 ↗(On Diff #107349)

It is not just a question of profitability. If lowerIntoOptimizedSequence were called for a load instruction that is too small, it would make an incorrect transformation, because the decompose function would generate Factor loads, each of size ShuffleVecSize.

I would also recommend a slightly different fix. Rather than checking the expected shuffle size for both the load & the store, I would check the "expected wide vector size". For loads, that means checking the type of the load. For stores, it means checking the type of the shuffle.

Farhana updated this revision to Diff 107373.Jul 19 2017, 2:04 PM
Farhana added inline comments.
lib/Target/X86/X86InterleavedAccess.cpp
108 ↗(On Diff #107349)

It is not just a question of profitability. If lowerIntoOptimizedSequence were called for a load instruction that is too small, it would make an incorrect transformation, because the decompose function would generate Factor loads, each of size ShuffleVecSize.

But, we could handle it easily by generating dummy vectors. And the reason we are not doing that because it will generate inefficient sequence. So, it boils down to the profitability.

DavidKreitzer added inline comments.Jul 19 2017, 2:43 PM
lib/Target/X86/X86InterleavedAccess.cpp
106 ↗(On Diff #107373)

You could make the ShuffleVecTy check more generic by testing this, which is similar to what you had before adding interleaved store support:

if (WideInstSize != DL.getTypeSizeInBits(ShuffleVecType) * Factor)
  return false;

I get your point about the profitability. I think your comment in the test adequately captures that.

Farhana added inline comments.Jul 20 2017, 6:51 AM
lib/Target/X86/X86InterleavedAccess.cpp
106 ↗(On Diff #107373)

I cannot write the way you suggested, it would be incorrect. We are checking the ShuffleVecType, so it can't be multiplied with Factor.

For store, it will check against 1024 * 4.

Farhana updated this revision to Diff 107504.Jul 20 2017, 7:03 AM
Farhana added inline comments.
lib/Target/X86/X86InterleavedAccess.cpp
106 ↗(On Diff #107373)

But we can make it semi generic by hardcoding the number of elements.

DavidKreitzer added inline comments.Jul 20 2017, 7:56 AM
lib/Target/X86/X86InterleavedAccess.cpp
106 ↗(On Diff #107373)

I cannot write the way you suggested, it would be incorrect. We are checking the ShuffleVecType, so it can't be multiplied with Factor. For store, it will check against 1024 * 4.

But the code in question is guarded by isa<LoadInst>(Inst). At any rate, what you have is fine if you change 256 to ShuffleElemSize * SupportedNumElem.

Farhana updated this revision to Diff 107556.Jul 20 2017, 10:55 AM
Farhana added inline comments.
lib/Target/X86/X86InterleavedAccess.cpp
106 ↗(On Diff #107373)

if (WideInstSize != DL.getTypeSizeInBits(ShuffleVecType) * Factor)

return false;

But the code in question is guarded by isa<LoadInst>(Inst).

But ShuffleVecType is type of shuffles[0] which can be different load vs store. So, comparing WideInstSize with DL.getTypeSizeInBits(ShuffleVecType) * Factor will result in incorrect value.

When it's a store, it will compare with with 1024 * 4.

At any rate, what you have is fine if you change 256 to ShuffleElemSize * SupportedNumElem.

It will produce incorrect result in the current code since ShuffleElemSize is not guaranteed to be 4 at that point.

I like the way it is since all checks are confined in one place, make it look clean. But I don't mind to change it since you prefer the other way.

Farhana updated this revision to Diff 107591.Jul 20 2017, 3:15 PM
Farhana added inline comments.
lib/Target/X86/X86InterleavedAccess.cpp
106 ↗(On Diff #107373)

Regarding 256, I take it back. You are right, we don't need the element size check.

DavidKreitzer accepted this revision.Jul 20 2017, 4:24 PM

Thanks, Farhana. LGTM.

This revision is now accepted and ready to land.Jul 20 2017, 4:24 PM
This revision was automatically updated to reflect the committed changes.