Page MenuHomePhabricator

DAGCombiner: Extend reduceBuildVecToTrunc to handle non-zero offset
ClosedPublic

Authored by zvi on Jul 20 2017, 2:10 PM.

Details

Summary

Adding support for combining power2-strided build_vector's where the
first build_vectori's operand is extracted from a non-zero index.

Example:

v4i32 build_vector((extract_elt V, 1),

(extract_elt V, 3),
(extract_elt V, 5),
(extract_elt V, 7))

-->
v4i32 truncate (bitcast (shuffle<1,u,3,u,5,u,7,u> V, u) to v4i64)

Diff Detail

Repository
rL LLVM

Event Timeline

zvi created this revision.Jul 20 2017, 2:10 PM
RKSimon added inline comments.Jul 21 2017, 7:31 AM
include/llvm/Target/TargetLowering.h
2782 ↗(On Diff #107583)

It would possibly make sense to change this to

isDesirableToCombineBuildVectorToShuffleTruncate(ArrayRef<int> ShuffleMask, EVT SrcVT, EVT TruncVT)

checking with a general shuffle mask - I'd expect any non-lane crossing shuffle mask to be valid on AVX512.

guyblank added inline comments.Jul 23 2017, 7:57 AM
test/CodeGen/X86/shuffle-strided-with-offset-256.ll
39 ↗(On Diff #107583)

should this be

vpmovdb %zmm0, (%rsi)

same for AVX512VL

guyblank added inline comments.Jul 23 2017, 8:46 AM
include/llvm/Target/TargetLowering.h
2779 ↗(On Diff #107583)

shuffle mask should be <1,u,3,u,5,u,7,u>

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14433 ↗(On Diff #107583)

update comment

14442 ↗(On Diff #107583)

I think you should add
Offset != 0
to the condition

14473 ↗(On Diff #107583)

you can use TLI

lib/Target/X86/X86ISelLowering.cpp
35835 ↗(On Diff #107583)

what about VPERMW for 16-bit elements?

zvi added inline comments.Jul 24 2017, 2:58 PM
include/llvm/Target/TargetLowering.h
2779 ↗(On Diff #107583)

right

2782 ↗(On Diff #107583)

Thanks. Will look into this suggestion.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14433 ↗(On Diff #107583)

ok

14442 ↗(On Diff #107583)

Makes sense. To remain consistent with current flow.

14473 ↗(On Diff #107583)

ok

lib/Target/X86/X86ISelLowering.cpp
35835 ↗(On Diff #107583)

We need to improve lowerBuildVector to make use of VPERMW. Until then we should go with this combine. Will add a TODO.

test/CodeGen/X86/shuffle-strided-with-offset-256.ll
39 ↗(On Diff #107583)

Look like a follow-up work needed here.

zvi added inline comments.Jul 24 2017, 3:19 PM
test/CodeGen/X86/shuffle-strided-with-offset-256.ll
39 ↗(On Diff #107583)

Ah you were referring to the regalloc issue that's been reported in pr32719

zvi edited the summary of this revision. (Show Details)Jul 24 2017, 3:24 PM
zvi updated this revision to Diff 108008.Jul 24 2017, 10:47 PM

Addressing Guy's comments

zvi updated this revision to Diff 108027.Jul 25 2017, 2:39 AM

Follow Simon's suggestion to generalize isDesirableToCombineBuildVectorToShuffleTruncate() to allow it to handle more cases.
The patch still only handles monotonically increasing indices - added TODO's to be revisited in follow-up work.

guyblank edited edge metadata.Jul 25 2017, 4:44 AM

I'm ok with this.

@RKSimon ?

RKSimon edited edge metadata.Jul 25 2017, 1:28 PM

A few minor comments, but looks almost ready.

lib/CodeGen/SelectionDAG/DAGCombiner.cpp
14480 ↗(On Diff #108027)

Construct the truncate.

lib/Target/X86/X86ISelLowering.cpp
35833 ↗(On Diff #108027)

assert for isShuffleMaskLegal to double check?

35844 ↗(On Diff #108027)

You could probably replace all of this with:

if(!is128BitLaneCrossingShuffleMask(SrcVT, ShuffleMask))
  return false;
35847 ↗(On Diff #108027)

Better to do this at the top to early-out if we don't support it. Then just return true at the end.

zvi updated this revision to Diff 108227.Jul 26 2017, 1:56 AM

Implemented Simon's suggestions.

zvi marked 4 inline comments as done.Jul 26 2017, 1:58 AM
RKSimon added inline comments.Jul 26 2017, 3:06 AM
lib/Target/X86/X86ISelLowering.cpp
35810 ↗(On Diff #108227)

No, this should be an early-out:

if (SrcVT.getScalarSizeInBits() == 32 || !Subtarget.hasAVX2())
  return false;
zvi added inline comments.Jul 26 2017, 3:16 AM
lib/Target/X86/X86ISelLowering.cpp
35810 ↗(On Diff #108227)

Whoops this is not what i intended to upload.

zvi updated this revision to Diff 108244.Jul 26 2017, 3:23 AM

The previous revision was a mishap caused by hitting save on a stale copy. Sorry.

RKSimon accepted this revision.Jul 26 2017, 4:13 AM

LGTM with one minor comment

lib/Target/X86/X86ISelLowering.cpp
35813 ↗(On Diff #108244)

Do we need the isSimple() test? It's handled by isShuffleMaskLegal

This revision is now accepted and ready to land.Jul 26 2017, 4:13 AM
zvi added inline comments.Jul 26 2017, 5:24 AM
lib/Target/X86/X86ISelLowering.cpp
35813 ↗(On Diff #108244)

Then in that case i will drop it. Thanks.

zvi updated this revision to Diff 108260.Jul 26 2017, 5:55 AM

Drop the isSimple() test.

This revision was automatically updated to reflect the committed changes.