This is an archive of the discontinued LLVM Phabricator instance.

[DAGCombiner] reduce extract subvector of concat
ClosedPublic

Authored by spatel on Jan 7 2020, 1:58 PM.

Details

Summary

If we are extracting a chunk of a vector that's smaller than an operand of the concatenated vector operand, we can extract directly from one of those original operands.
This is another suggestion from PR42024:
https://bugs.llvm.org/show_bug.cgi?id=42024#c2

But I'm not sure yet if it will make any difference on those patterns. It seems to help a few existing AVX512 tests though.

Most of the code diff here is refactoring, so I can make that a preliminary commit if preferred.

Diff Detail

Event Timeline

spatel created this revision.Jan 7 2020, 1:58 PM
Herald added a project: Restricted Project. · View Herald TranscriptJan 7 2020, 1:58 PM

IMHO some NFC cleanup wouldn't decrease readability of the diff

RKSimon accepted this revision.Jan 8 2020, 1:21 AM

LGTM - some of NFC refactoring as pre-commits would make sense to separate it from the new features.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18576

If we're peeking into any concat can we guarantee this or should we just wrap in an if() instead?

This revision is now accepted and ready to land.Jan 8 2020, 1:21 AM
spatel marked an inline comment as done.Jan 8 2020, 6:33 AM

LGTM - some of NFC refactoring as pre-commits would make sense to separate it from the new features.

Thanks - yes, I'll push the cosmetic cleanup and update here.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18576

This is an edit of the existing assert, and IIUC, it's independent of the concat. We are asserting that the extract_subvector conforms with its definition:

/// EXTRACT_SUBVECTOR(VECTOR, IDX) - Returns a subvector from VECTOR (an
/// vector value) starting with the element number IDX, which must be a
/// constant multiple of the result vector length.
spatel updated this revision to Diff 236826.Jan 8 2020, 7:30 AM

Patch updated:
Rebased after pre-committing the cleanup with rG780ba1f22b53.

lebedev.ri added inline comments.Jan 8 2020, 7:42 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18591

I'm having a hard time parsing this.

Did we ensure that the original extract only extracts
only from one single concatenated vector?

I.e. that it does not take a new elements from
the end of first concatenated vector, and a few elements
from the beginning of second concatenated vector?
(That case could be implemented as a shuffle if we concatenating the same vector)

spatel marked an inline comment as done.Jan 8 2020, 8:03 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18591

I think this goes back to the assert mentioned above for extract_subvector. It's not legal to have something like this:
v2i8 extract_subvec (v16i8 X), 7

Because the index operand must be a multiple of the final vector length (7 % 2 != 0).
So that means it is impossible to extract from more than 1 operand of the concat (because the concat operands are guaranteed to be longer than this extract result).

I can add the above as a code comment if it helps. We could also assert that ExtNumElts % NewExtIdx == 0?

spatel marked an inline comment as done.Jan 8 2020, 8:30 AM
spatel added inline comments.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
18591

Aha - I wasn't being imaginative enough. My assumption was that the concat vector ops are some multiple of the final vector size rather than just larger.

So yes, it is possible to straddle concat ops if we have something like this:
v4i8 extract_subvec (v12i8 concat (v6i8 X), (v6i8 Y), 4

I'll fix the predicate and add an assert.

lebedev.ri requested changes to this revision.Jan 8 2020, 8:52 AM
This revision now requires changes to proceed.Jan 8 2020, 8:52 AM
spatel updated this revision to Diff 236840.Jan 8 2020, 8:53 AM

Patch updated:
Made predicate more restrictive and added asserts.

Patch updated:
Made predicate more restrictive and added asserts.

Can a test case be constructed for that?

(That case could be implemented as a shuffle, if we don't do that already)

Patch updated:
Made predicate more restrictive and added asserts.

Can a test case be constructed for that?

(That case could be implemented as a shuffle, if we don't do that already)

This takes some creativity because we won't generate the needed DAG nodes if the types are too weird and/or mapped to registers in a strange way, but I've found a case that is legal enough to show what would have been a miscompile:
define <4 x i32> @cat_ext_straddle(<6 x i32>* %px, <6 x i32>* %py) {

%x = load <6 x i32>, <6 x i32>* %px
%y = load <6 x i32>, <6 x i32>* %py
%cat = shufflevector <6 x i32> %x, <6 x i32> %y, <12 x i32> <i32 0, i32 1, i32 2, i32 3, i32 4, i32 5, i32 6, i32 7, i32 8, i32 9, i32 10, i32 11>
%ext = shufflevector <12 x i32> %cat, <12 x i32> undef, <4 x i32> <i32 4, i32 5, i32 6, i32 7>
ret <4 x i32> %ext

}

Should be:

movaps	16(%rdi), %xmm0
unpcklpd	(%rsi), %xmm0   ## xmm0 = xmm0[0],mem[0]

But would be miscompiled by the earlier rev of this patch:

movaps	16(%rdi), %xmm0

rG31992a69b808

lebedev.ri accepted this revision.Jan 8 2020, 11:58 AM

...

Thank you!

This revision is now accepted and ready to land.Jan 8 2020, 11:58 AM
This revision was automatically updated to reflect the committed changes.