This is an archive of the discontinued LLVM Phabricator instance.

[x86] allow 64-bit extracted vector element integer stores on a 32-bit system
ClosedPublic

Authored by spatel on Apr 20 2015, 4:50 PM.

Details

Summary

With SSE2, we can generate a 'movq' or other 64-bit store op on a 32-bit system even though 64-bit integers are not legal types.

So instead of producing this:

pshufd	$229, %xmm0, %xmm1      ## xmm1 = xmm0[1,1,2,3]
movd	%xmm0, (%eax)
movd	%xmm1, 4(%eax)

We can do:

movq %xmm0, (%eax)

This is a fix for the problem noted in D7296.

Diff Detail

Repository
rL LLVM

Event Timeline

spatel updated this revision to Diff 24077.Apr 20 2015, 4:50 PM
spatel retitled this revision from to [x86] allow 64-bit extracted vector element integer stores on a 32-bit system.
spatel updated this object.
spatel edited the test plan for this revision. (Show Details)
spatel added a subscriber: Unknown Object (MLST).
ab added a subscriber: ab.Apr 20 2015, 5:58 PM
ab added inline comments.
lib/Target/X86/X86ISelLowering.cpp
22941–22942 ↗(On Diff #24077)

Isn't v2f64 too specific? What if you have a 256bit vector? (I haven't seen any related check in the function)

spatel added inline comments.Apr 20 2015, 6:15 PM
lib/Target/X86/X86ISelLowering.cpp
22941–22942 ↗(On Diff #24077)

Yes, you're absolutely right. This won't work as-is for an AVX-capable target in 32-bit mode. Nice catch!

spatel updated this revision to Diff 24092.Apr 20 2015, 6:58 PM

Patch updated:

  1. Fixed to handle more than just 128-bit vectors.
  2. Added test case to make sure a 256-bit AVX vector is processed; this test case crashed with the previous version of the patch.
spatel added a reviewer: ab.Apr 20 2015, 6:58 PM
ab accepted this revision.Apr 21 2015, 9:37 AM
ab edited edge metadata.

A couple nits, but LGTM, thanks!

lib/Target/X86/X86ISelLowering.cpp
22939–22944 ↗(On Diff #24092)

Very subjective, but I feel like there's a couple variables too many. You decide.

22946–22950 ↗(On Diff #24092)

Reflow formatting?

test/CodeGen/X86/i64-mem-copy.ll
27–28 ↗(On Diff #24092)

Perhaps also test a lane index != 0?

This revision is now accepted and ready to land.Apr 21 2015, 9:37 AM
This revision was automatically updated to reflect the committed changes.

Thanks, Ahmed - r235460.
I changed the last test case to use a non-zero index and fixed the formatting.
Re: local vars - if I have my choice, I add as many locals as possible to make for easier debugging, so I left that part as-is. :)