This is an archive of the discontinued LLVM Phabricator instance.

[X86] WIP support narrowing operations when only a subvector is demanded
Changes PlannedPublic

Authored by craig.topper on Aug 12 2017, 11:16 PM.

Details

Summary

This is sort of a work in progress to try to see if we can work toward supporting D36454 and other cases where narrowing can give us smaller operations enabling EVEX->VEX.

There are really 2 components to this patch. Combining 2 layers of extract_subvectors/insert_subvectors. And moving a subvector extract through an operation to its inputs.

So far this successfully narrows the final reduction operation in the sad and madd tests. Ideally we'd narrow some of the earlier operations as well.

We also need to support matching horizontal binops from target shuffles in order to convert the final operation to HADD.

Diff Detail

Event Timeline

craig.topper created this revision.Aug 12 2017, 11:16 PM

Fixed a couple issues and got 2 adds narrowed now.

Looks like there's till an issue where we have equivalent extracts from the next add but one is sandwiched in bitcasts and the other isn't. So they don't get CSEd and then inflate the use count of the add.

RKSimon added inline comments.Aug 13 2017, 6:21 AM
lib/Target/X86/X86ISelLowering.cpp
35535

Should this be in DAGCombiner protected with a isExtractSubvectorCheap call?

35587

Minor observation, we're assuming constant index above (IdxVal), but here we're checking for it.

RKSimon edited edge metadata.Aug 13 2017, 6:23 AM

We also need to support matching horizontal binops from target shuffles in order to convert the final operation to HADD.

For reference: PR34110 and PR34111 suggests the x86 horizontal binop code needs work

craig.topper planned changes to this revision.Dec 14 2017, 9:47 AM
RKSimon added a subscriber: spatel.

Adding @spatel to this old patch as he has looked at similar codegen issues recently

spatel added a comment.Jan 9 2019, 9:56 AM

We did get semi-generic vector narrowing of binops with:
D53784
D54392

But it doesn't handle exactly the same set of ops as shown here.