Page MenuHomePhabricator

[X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads
ClosedPublic

Authored by RKSimon on Jan 29 2016, 10:33 AM.

Details

Summary

This patch adds support for consecutive (load/undef elements) 32-bit loads, followed by trailing undef/zero elements to be combined to a single MOVSS load.

Follow up to D16217

Note: I've been looking into correcting the domain for both the MOVSS/MOVD and the MOVSD/MOVQ load/stores but am concerned about the number of test changes - is this something that people think is worthwhile? I'd probably have to change many of the tests to ensure that they keep to the intended domain,

Diff Detail

Repository
rL LLVM

Event Timeline

RKSimon updated this revision to Diff 46397.Jan 29 2016, 10:33 AM
RKSimon retitled this revision from to [X86][SSE] Add general 32-bit LOAD + VZEXT_MOVL support to EltsFromConsecutiveLoads.
RKSimon updated this object.
RKSimon added reviewers: qcolombet, spatel, mkuper.
RKSimon set the repository for this revision to rL LLVM.
RKSimon added a subscriber: llvm-commits.
RKSimon updated this revision to Diff 46645.Feb 2 2016, 6:09 AM
RKSimon added a reviewer: delena.

Add 512-bit vector support

delena added inline comments.Feb 2 2016, 12:10 PM
test/CodeGen/X86/merge-consecutive-loads-128.ll
335 ↗(On Diff #46645)

I think that in these architectures we pay additional cycle for switching from INT to FP. Can we use movd?

test/CodeGen/X86/merge-consecutive-loads-256.ll
531 ↗(On Diff #46645)

this instruction (movss) reads 4 bytes from memory. Does it require 4 bytes alignment?

RKSimon added inline comments.Feb 2 2016, 3:07 PM
test/CodeGen/X86/merge-consecutive-loads-128.ll
335 ↗(On Diff #46645)

This was something I mentioned in the summary - adding domain support for MOVSS/MOVD is straightforward but has a knock on effect on a lot of tests, which would need some tests modifying to keep to the original domain and others we'd let switch. If you think its worthwhile I'll start looking at this more seriously?

test/CodeGen/X86/merge-consecutive-loads-256.ll
531 ↗(On Diff #46645)

Not unless SSE/AVX alignment checks are enabled - AFAICT llvm assumes they aren't. We are using the alignment of the base pointer, so lowering of the consecutive load is being driven from that.

RKSimon updated this revision to Diff 46790.Feb 3 2016, 8:13 AM

Converted loads to the integer domain - as Elena said, its the more sensible options for consecutive 32-bit loads.

delena accepted this revision.Feb 3 2016, 11:47 PM
delena edited edge metadata.

LGTM

lib/Target/X86/X86ISelLowering.cpp
5651 ↗(On Diff #46645)

What happens here in 32-bit mode, where i64 is illegal?

5674 ↗(On Diff #46645)

I think that all these checks are not necessary. (VT.getSizeInBits() >= 128) should be enough.

5678 ↗(On Diff #46645)

What happens if you don't put hardcoded MVT::f32, but choose between f32 and i32 according to VT ?

lib/Target/X86/X86InstrAVX512.td
3054 ↗(On Diff #46645)

Why v8i64 is not handled in this patterns?

This revision is now accepted and ready to land.Feb 3 2016, 11:47 PM
This revision was automatically updated to reflect the committed changes.

Thanks Elena - I've addressed your comments in a couple of addition commits:

Added 32-bit target tests to make sure that the 64-bit integer loads still happen (this is the point of the VZEXT_LOAD op any way).

Added float/integer domain handling and removed the over-zealous IsTypeLegal tests.

Note: the v8i64 VZEXT_LOAD patterns are already handled in another part of X86InstrAVX512.td