This is an archive of the discontinued LLVM Phabricator instance.

merge consecutive 16-byte loads into one 32-byte load (PR22329)
ClosedPublic

Authored by spatel on Jan 30 2015, 12:00 PM.

Details

Summary

This patch detects consecutive vector loads using the existing EltsFromConsecutiveLoads() logic. This fixes:
http://llvm.org/bugs/show_bug.cgi?id=22329

This patch effectively reverts the tablegen additions of D6492 / http://reviews.llvm.org/rL224344 ...which in hindsight were a horrible hack. :)

The test cases that were added with that patch are simply modified to load from varying offsets of a base pointer. These loads did not match the existing tablegen patterns.

A happy side effect of doing this optimization earlier is that we can now fold the load into a math op where possible; this is shown in some of the updated checks in the test file.

Diff Detail

Event Timeline

spatel updated this revision to Diff 19050.Jan 30 2015, 12:00 PM
spatel retitled this revision from to merge consecutive 16-byte loads into one 32-byte load (PR22329).
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added reviewers: delena, hfinkel, mkuper, RKSimon.
spatel added a subscriber: Unknown Object (MLST).
mkuper added inline comments.Feb 1 2015, 6:37 AM
lib/Target/X86/X86ISelLowering.cpp
6099

If I'm reading this correctly, before this change, if we got here, then the size of VT always matched the size of the found consecutive load (VT.getSizeInBits() == EltVt.getSizeInBits() * NumElems).
With this change, I think that no longer holds. The size of the consecutive load we find is LdVT.getSizeInBits() * Elts.size(), but there's no guarantee that this is actually the size of VT. The responsibility for ensuring this condition holds has moved to the caller.

I think we now need an additional check that the sizes indeed match.

13216

You probably also want to check that Idx is what you expect it to be.

spatel added inline comments.Feb 1 2015, 9:49 AM
lib/Target/X86/X86ISelLowering.cpp
6099

Previously, I think there was an implicit assumption that each of the Elts was a matching scalar load of VT.getVectorElementType(); this was safe assuming the Elts were extracted from a build_vector.

But yes, I agree. I will add a check inside the loop to confirm that each element matches the fractional size that we're expecting, and I will add a check after the loop to confirm that the cumulative size of the element loads matches the total size of the vector type (VT).

13216

Agree again - this is too lax. I will add checks to make sure that both insert_subvector indices match what we need.

spatel updated this revision to Diff 19112.Feb 1 2015, 10:10 AM
spatel updated this object.

Added type checks for loads. Added insertion index checks for both subvector instructions.
Thanks, Michael!

mkuper accepted this revision.Feb 3 2015, 2:02 AM
mkuper edited edge metadata.

LGTM

This revision is now accepted and ready to land.Feb 3 2015, 2:02 AM
This revision was automatically updated to reflect the committed changes.