This is an archive of the discontinued LLVM Phabricator instance.

[X86] Add a DAG combine to shrink vXi64 gather/scatter indices that are constant with sufficient sign bits to fit in vXi32
ClosedPublic

Authored by craig.topper on Sep 30 2019, 3:36 PM.

Details

Summary

The gather/scatter instructions can implicitly sign extend the indices. If we're operating on 32-bit data, an v16i64 index can force a v16i32 gather to be split in two since the index needs 2 registers. If we can shrink the index to the i32 we can avoid the split. It should always be safe to shrink the index regardless of the number of elements. We have gather/scatter instructions that can use v2i32 index stored in a v4i32 register with v2i64 data size.

I've limited this to before legalize types to avoid creating a v2i32 after type legalization. We could check for it, but we'd also need testing. I'm also only handling build_vectors with no bitcasts to be sure the truncate will constant fold.

Diff Detail

Repository
rL LLVM

Event Timeline

craig.topper created this revision.Sep 30 2019, 3:36 PM
Herald added a project: Restricted Project. · View Herald TranscriptSep 30 2019, 3:36 PM
Herald added a subscriber: hiraditya. · View Herald Transcript
RKSimon added inline comments.Oct 1 2019, 6:33 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
42384 ↗(On Diff #222506)

Does it have to be build vector? General truncations should be pretty good in these cases whatever.

craig.topper marked an inline comment as done.Oct 1 2019, 8:42 AM
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
42384 ↗(On Diff #222506)

No. I guess you could truncate anytime the index element is wider than the data element and the index type isn't legal and is a type that needs to be split or a type that will be widened to a type that will be split. You wouldn't want to truncate if it won't prevent a split since you'd just be adding more code. Unless it was sext/zext which would be removed, but that's already handled below. Is that what you were thinking?

RKSimon added inline comments.Oct 1 2019, 9:37 AM
llvm/lib/Target/X86/X86ISelLowering.cpp
42384 ↗(On Diff #222506)

I was thinking more in terms of the high number of sign bits means that we can always cheaply truncate with VTRUNC/PACKSS

craig.topper marked an inline comment as done.Oct 1 2019, 9:52 AM
craig.topper added inline comments.
llvm/lib/Target/X86/X86ISelLowering.cpp
42384 ↗(On Diff #222506)

So should we always truncate or only in the cases I said? Pack is going to require at least another shuffle to split the data into 2 sources. And VTRUNC is 2 uops by itself. I think PACKSS goes from 1 cycle latency to 3 cycle latency on icelake.

RKSimon accepted this revision.Oct 1 2019, 2:54 PM

LGTM

llvm/lib/Target/X86/X86ISelLowering.cpp
42384 ↗(On Diff #222506)

OK - it becomes a cost issue. We could use isTruncateFree to check but at the moment we don't do anything for vectors. I'm happy if we just leave it as a TODO comment for now.

This revision is now accepted and ready to land.Oct 1 2019, 2:54 PM
This revision was automatically updated to reflect the committed changes.