Page MenuHomePhabricator

[X86] EltsFromConsecutiveLoads - support common source loads
ClosedPublic

Authored by RKSimon on Jul 11 2019, 3:00 AM.

Details

Summary

This patch enables us to find the source loads for each element, splitting them into a Load and ByteOffset, and attempts to recognise consecutive loads that are in fact from the same source load.

A helper function, FindEltLoadSrc, recurses through to find a LoadSDNode and determines the element's byte offset within it. When attempting to match consecutive loads, byte offsetted loads then attempt to matched against a previous load that has already been confirmed to be a consecutive match.

Next step towards PR16739 - after this we just need to account for shuffling/repeated elements to create a vector load + shuffle.

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon created this revision.Jul 11 2019, 3:00 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 11 2019, 3:00 AM

Does this need to do anything to ensure that there are no interferences, in the sense of non-known-noalias writes?

Does this need to do anything to ensure that there are no interferences, in the sense of non-known-noalias writes?

That's what areNonVolatileConsecutiveLoads handles no?

Does this need to do anything to ensure that there are no interferences, in the sense of non-known-noalias writes?

That's what areNonVolatileConsecutiveLoads handles no?

It isn't fully obvious whether that only checks that the loads are from sequential locations in memory,
or whether it also checks that it is *legal* to perform those loads as one, at least to me.
I.e. i'm expecting this doesn't fold:

define <4 x float> @load_float4_float3_as_float2_float__with_write(<4 x float>* nocapture readonly dereferenceable(16)) {
  %2 = bitcast <4 x float>* %0 to <2 x float>*
  %3 = load <2 x float>, <2 x float>* %2, align 4
  %4 = extractelement <2 x float> %3, i32 0
  %5 = insertelement <4 x float> undef, float %4, i32 0
  %6 = extractelement <2 x float> %3, i32 1
  %7 = insertelement <4 x float> %5, float %6, i32 1
  %8 = getelementptr inbounds <4 x float>, <4 x float>* %0, i64 0, i64 2
  store float 42.0, float* %8 ; !!!
  %9 = load float, float* %8, align 4
  %10 = insertelement <4 x float> %7, float %9, i32 2
  ret <4 x float> %10
}

Is that what if (LD->getChain() != Base->getChain()) return false; does?

Is that what if (LD->getChain() != Base->getChain()) return false; does?

Yes, chains will handle these kinds of dependencies - do you want me to add that test ?

Is that what if (LD->getChain() != Base->getChain()) return false; does?

Yes, chains will handle these kinds of dependencies

Great to know! No other comments from me.

  • do you want me to add that test ?

Hmm, i'm not fully sure it's really useful as-is - in *that* case we can just reorder that store before these loads, so *that* case could still be folded.

Any more comments? I'm keen to try and get PR16739 fixed before the release branch.

spatel accepted this revision.Jul 18 2019, 5:41 AM

It's worth noting here in the review that this patch depends on the dereferenceable attribute (see D64205), and that attribute could change meaning as part of the larger changes related to the Attributor pass (D63243).
Based on current definitions, I think this is correct and allowable, so LGTM.

lib/Target/X86/X86ISelLowering.cpp
7505 ↗(On Diff #209154)

formatting:
FindElt... --> findElt

7519–7521 ↗(On Diff #209154)

Inconsistent methods for getting the constant (APInt vs. uin64_t). I don't think a logic difference is possible given other limits of LLVM, so go with getConstantOperandVal() in both places?

This revision is now accepted and ready to land.Jul 18 2019, 5:41 AM
This revision was automatically updated to reflect the committed changes.

This broke compilation for me, with https://martin.st/temp/simd_cmp_avx2.cpp, built with clang -target i686-w64-mingw32 -c -O3 -mavx2 simd_cmp_avx2.cpp, triggering failed asserts. I can file a proper bug later if necessary.

This broke compilation for me, with https://martin.st/temp/simd_cmp_avx2.cpp, built with clang -target i686-w64-mingw32 -c -O3 -mavx2 simd_cmp_avx2.cpp, triggering failed asserts. I can file a proper bug later if necessary.

Filed as a bug at https://bugs.llvm.org/show_bug.cgi?id=42727.