This is an archive of the discontinued LLVM Phabricator instance.

merge consecutive 16-byte loads into one 32-byte load (PR22329)
ClosedPublic

Authored by spatel on Jan 30 2015, 12:00 PM.

Details

Summary

This patch detects consecutive vector loads using the existing EltsFromConsecutiveLoads() logic. This fixes:
http://llvm.org/bugs/show_bug.cgi?id=22329

This patch effectively reverts the tablegen additions of D6492 / http://reviews.llvm.org/rL224344 ...which in hindsight were a horrible hack. :)

The test cases that were added with that patch are simply modified to load from varying offsets of a base pointer. These loads did not match the existing tablegen patterns.

A happy side effect of doing this optimization earlier is that we can now fold the load into a math op where possible; this is shown in some of the updated checks in the test file.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 19050.Jan 30 2015, 12:00 PM
spatel retitled this revision from to merge consecutive 16-byte loads into one 32-byte load (PR22329).
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: delena, hfinkel, mkuper, RKSimon.
spatel added a subscriber: Unknown Object (MLST).
mkuper added inline comments.Feb 1 2015, 6:37 AM
lib/Target/X86/X86ISelLowering.cpp
6099 ↗(On Diff #19050)

If I'm reading this correctly, before this change, if we got here, then the size of VT always matched the size of the found consecutive load (VT.getSizeInBits() == EltVt.getSizeInBits() * NumElems).
With this change, I think that no longer holds. The size of the consecutive load we find is LdVT.getSizeInBits() * Elts.size(), but there's no guarantee that this is actually the size of VT. The responsibility for ensuring this condition holds has moved to the caller.

I think we now need an additional check that the sizes indeed match.

13216 ↗(On Diff #19050)

You probably also want to check that Idx is what you expect it to be.

spatel added inline comments.Feb 1 2015, 9:49 AM
lib/Target/X86/X86ISelLowering.cpp
6099 ↗(On Diff #19050)

Previously, I think there was an implicit assumption that each of the Elts was a matching scalar load of VT.getVectorElementType(); this was safe assuming the Elts were extracted from a build_vector.

But yes, I agree. I will add a check inside the loop to confirm that each element matches the fractional size that we're expecting, and I will add a check after the loop to confirm that the cumulative size of the element loads matches the total size of the vector type (VT).

13216 ↗(On Diff #19050)

Agree again - this is too lax. I will add checks to make sure that both insert_subvector indices match what we need.

spatel updated this revision to Diff 19112.Feb 1 2015, 10:10 AM
spatel updated this object.

Added type checks for loads. Added insertion index checks for both subvector instructions.
Thanks, Michael!

mkuper accepted this revision.Feb 3 2015, 2:02 AM
mkuper edited edge metadata.

LGTM

This revision is now accepted and ready to land.Feb 3 2015, 2:02 AM
This revision was automatically updated to reflect the committed changes.