This is an archive of the discontinued LLVM Phabricator instance.

Remove "mask" operand from shufflevector.
ClosedPublic

Authored by efriedma on Jan 9 2020, 10:33 AM.

Details

Summary
Instead, represent the mask as out-of-line data in the instruction. This
should be more efficient in the places that currently use
getShuffleVector(), and paves the way for further changes to add new
shuffles for scalable vectors.

This doesn't change the syntax in textual IR. And I don't currently plan
to change the bitcode encoding in this patch, although we'll probably
need to do something once we extend shufflevector for scalable types.

I expect that once this is finished, we can then replace the raw "mask"
with something more appropriate for scalable vectors.  Not sure exactly
what this looks like at the moment, but there are a few different ways
we could handle it.  Maybe we could try to describe specific shuffles.
Or maybe we could define it in terms of a function to convert a fixed-length
array into an appropriate scalable vector, using a "step", or something
like that.

The big question here, of course, is "do we want to do this?", and if we
do, "do we want to do this now?". For the immediate needs of SVE, we
don't need fully general shuffles; we have intriniscs for the native
instructions, and we can add additional intrinsics or intrinsic
overloads for the various "extended" patterns we expect to need in the
immediate future that don't involve native SVE vectors, like
concatenating two "<vscale x 2 x i1>" vectors.  The biggest risk here
is we end up essentially throwing this work away when we add another
target that supports vscale vectors.

Diff Detail

Event Timeline

efriedma created this revision.Jan 9 2020, 10:33 AM
Herald added a project: Restricted Project. · View Herald Transcript
efriedma updated this revision to Diff 237129.Jan 9 2020, 10:35 AM

(Reupload with context.)

The big question here, of course, is "do we want to do this?", and if we do, "do we want to do this now?".

Speaking for myself, "yes" and "sooner is better, we should have done this years ago" :-)

Thank you for pushing on this!

llvm/include/llvm/Analysis/ConstantFolding.h
126

Please update the comment to indicate what non-positive values are allowed etc.

This change would also remove the logical gymnastics needed to give special meaning to the mask with respect to undef/poison:
http://llvm.org/docs/LangRef.html#shufflevector-instruction

If the mask not an operand, then it's easier to explain that it doesn't follow the typical operand rules, so +1 for that benefit too.

efriedma updated this revision to Diff 237453.Jan 10 2020, 3:58 PM

Add more comments, fixed the GlobalISel representation of shuffles, misc other cleanups. Still haven't addressed the bitcode reader/writer issues.

xbolva00 added inline comments.
llvm/lib/Analysis/InstructionSimplify.cpp
4446

maybe introduce new helper function to check if mask is "undef"?

efriedma marked an inline comment as done.Jan 10 2020, 4:55 PM
efriedma added inline comments.
llvm/lib/Analysis/InstructionSimplify.cpp
4446

I could introduce ShuffleVectorInst::MaskIsAllUndef and ShuffleVectorInst::MaskIsAllZero, I guess. Not sure how helpful that actually is for a predicate that can be expressed on one line anyway.

efriedma updated this revision to Diff 237463.Jan 10 2020, 5:03 PM

Figured out the changes necessary to get bitcode reading/writing working. ninja check now passes.

Not sure what parts I should split off to get reviewed separately. Probably the GlobalISel changes could be split off without adding too much complexity; not sure what else.

simoll added inline comments.Jan 13 2020, 6:18 AM
llvm/lib/Analysis/InstructionSimplify.cpp
4446

How about introducing the constant UndefMaskElem and changing this to something like:

if (all_of(Mask, [](int Elem) { return Elem == UndefMaskElem; }))

That would make it self-explanatory again and wouldn't expose the specific encoding of undef mask elems.

llvm/lib/IR/Instructions.cpp
1918–1919

With the suggestion: NewMask[i] = UndefMaskElem

efriedma marked an inline comment as done.Jan 13 2020, 3:04 PM
efriedma added inline comments.
llvm/lib/Analysis/InstructionSimplify.cpp
4446

You mean just llvm::UndefMaskElem? I guess that makes sense.

efriedma updated this revision to Diff 237825.Jan 13 2020, 5:42 PM
efriedma retitled this revision from [WIP] Remove "mask" operand from shufflevector. to Remove "mask" operand from shufflevector..

Rebase. Address review comment. More work on bitcode.

I probably need to come up with some testcases for the bitcode encode/decode; we currently don't have great coverage. Otherwise, I think this patch is complete.

efriedma updated this revision to Diff 237830.Jan 13 2020, 6:24 PM
efriedma edited the summary of this revision. (Show Details)

Fixed one more issue. While I'm at it, get rid of the old overload of ConstantExpr::getShuffleVector, which only had a few remaining uses.

spatel added inline comments.Jan 23 2020, 7:07 AM
llvm/include/llvm/IR/Constants.h
1229

an shufflevector -> a shufflevector

1233

an shufflevector -> a shufflevector

llvm/include/llvm/IR/IRBuilder.h
2621

Add an assert that Mask isa<Constant> ?

llvm/include/llvm/IR/Instructions.h
1996–1997

This reads awkwardly to me (if you agree, we can update the LangRef too).
How about:
"For each element of the result vector, the shuffle mask selects an element from one of the input vectors to copy to the result. Non-negative elements in the mask represent an index into the concatenated pair of input vectors. UndefMaskElem (-1) specifies that the result element is undefined."

2024–2025

This comment doesn't add value to me, so I'd just delete it. If we want to keep it, should fix it to be a proper sentence with a capital letter and period.

llvm/include/llvm/IR/PatternMatch.h
1357–1415

Update/add comment:
"Matches ShuffleVectorInst independently of mask value." ?

1384

IIUC, this is meant to replace the current uses of m_Zero() or m_ZeroInt() with shuffle pattern matching, but there's a logic difference because those matchers allow undefs, but this does not? Are we missing unittests to catch that case?

llvm/lib/Bitcode/Writer/ValueEnumerator.cpp
939–941

Indentation looks off-by-1.

llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp
1977

if (auto *SVI = ...)

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3441

if (auto *SVI = ...)

llvm/lib/ExecutionEngine/Interpreter/Execution.cpp
1890

I've never looked in here before - what happens currently if the index is -1 (undef)?

llvm/lib/IR/ConstantsContext.h
163–164

This comment doesn't add value to me, so I'd just delete it. If we want to keep it, should fix it to be "two operands" and a proper sentence with a capital letter and period.

582

(not familiar with this part)
If we're using Indexes here, do we need to update the code/comments for hasIndices()?

/// Return true if this is an insertvalue or extractvalue expression,
/// and the getIndices() method may be used.
llvm/lib/IR/Instruction.cpp
446

Would have suggested "auto *" here, but really the whole thing should be turned into a switch() as a preliminary cleanup?

llvm/lib/IR/Instructions.cpp
1897–1898

I see this is copied, but indentation seems off.

llvm/lib/Transforms/Scalar/GVN.cpp
302

Use "auto *" in each clause (or convert to switch).

ctetreau added inline comments.Jan 23 2020, 8:41 AM
llvm/include/llvm/IR/IRBuilder.h
2621

cast<T> already does this assert.

2628

[0 .. uint32_MAX] is wider than [0 .. int_MAX]. Assert the values are in range somehow?

llvm/include/llvm/IR/Instructions.h
1996–1997

I also found the wording of the description of shufflevector in the language ref confusing. Since this description seems to be related I thought it worth mentioning.

It's unclear to me how the values specify vector inputs. I assume for two vectors <4 x 18>, that the mask <3, 4, 5, 6> takes the last element of the first vector, and the first three elements of the second vector, but this isn't explicitly stated anywhere.

efriedma marked 5 inline comments as done.Jan 23 2020, 3:00 PM
efriedma added inline comments.
llvm/include/llvm/IR/IRBuilder.h
2628

Either the resulting shuffle mask is invalid (and we'll assert later), or the input contained UINT_MAX, in which case we'll treat it as undef.

llvm/include/llvm/IR/Instructions.h
2024–2025

I think the comment is there to identify the magic number. Maybe I should introduce a named constant ShuffleVectorInst::NumOperands to make that clear.

llvm/include/llvm/IR/PatternMatch.h
1384

I guess we're missing tests, yes; I didn't see any failures.

llvm/lib/ExecutionEngine/Interpreter/Execution.cpp
1890

The interpreter evaluates undef to zero, and therefore it behaves as if every undef is replaced with zero.

Using std::max here preserves the existing behavior.

llvm/lib/IR/ConstantsContext.h
582

As part of constructing ConstantExprs, there's a map from ConstantExprKeyType to ConstantExpr, to ensure each expression is unique. "Indexes" is just a list of integers, so I was reusing it as a shortcut, to try to avoid making an extra ArrayRef. This only impacts the map itself, not the final ConstantExprs.

It looks like I missed that a few places use hasIndices() to check whether the ConstantExprKeyType key stores data in Indexes. I should probably just add the extra member variable to make the logic clear, even if it wastes a tiny bit of stack space.

efriedma marked 21 inline comments as done.Jan 23 2020, 4:11 PM
efriedma updated this revision to Diff 240071.Jan 23 2020, 5:28 PM

Addressed review comments. Rebased.

Chris Tetreault mentioned offline that we might want to change the return value of getShuffleMask() now, to try to save some work later when we add more forms of scalable vector shuffle. (Not actually changing the functionality of any code, just pre-emptively changing the types so we don't have to change code again if it doesn't actually examine the shuffle mask directly.) I'll look into it.

spatel accepted this revision.Jan 24 2020, 5:51 AM

LGTM, but should get a 2nd opinion since I'm not familiar with some of the parts.
Also, please update the related LangRef text for shufflevector in or alongside this patch.

Could add some ascii art to visualize the mask indexing as requested by @ctetreau ?

shufflevector <3 x float> %x, <3 x float> %y, <n x i32> %mask:
+---+---+---+---+---+---+
|     x     |     y     |  vector operands
+---+---+---+---+---+---+
| 0 | 1 | 2 | 0 | 1 | 2 |  per-operand indexing
+---+---+---+---+---+---+
| 0 | 1 | 2 | 3 | 4 | 5 |  shuffle mask indexing
+---+---+---+---+---+---+
llvm/include/llvm/IR/Instructions.h
2047–2048

Can replace the magic -1 with the named "UndefMaskElem" in this comment.

2057

Can replace the magic -1 with the named "UndefMaskElem" in this comment.

This revision is now accepted and ready to land.Jan 24 2020, 5:51 AM

Thanks @efriedma for working on this. The overall approach seems good to me!
Mostly added some nits, with the exception of one question on what to do with a shufflevector with fixed-width mask and scalable source vectors.

llvm/include/llvm/IR/Instructions.h
1991

nit: do we want to make this a member of ShuffleVectorInst, as this is likely the only context in which it will be used?

llvm/lib/Bitcode/Writer/BitcodeWriter.cpp
2705–2706

nit: >80char (please run through clang-format)

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3451–3452

Should this use m_ZeroMask() ?

llvm/lib/IR/ConstantFold.cpp
886–887

Is it worth adding a m_UndefMask ?

llvm/lib/IR/Constants.cpp
2282

nit: is there a reason for dropping const here?

llvm/lib/IR/ConstantsContext.h
149–150

The number of elements in the result matches the number of elements in the mask, but if we assume 'scalable' from one of the source vectors, this means we make the choice that we cannot extract a fixed-width vector from a scalable vector, e.g.

shufflevector(<vscale x 4 x i32> %in, <vscale x 4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>)

The LangRef does not explicitly call out this case (and it currently fails to compile), but it would provide a way to extract a sub-vector from a scalable vector.
If we want to allow this, we'd also need to decide what the meaning would be of:

shufflevector(<vscale x 4 x i32> %in, <vscale x 4 x i32> undef, <4 x i32> <i32 5, i32 6, i32 7, i32 8>)

which may not be valid if vscale = 1.

Alternatively, we could implement this with an additional extract intrinsic and add the restriction that if the source values are scalable, the mask must be scalable as well. If so, then we should update the LangRef to explicitly disallow the above case.

467

nit: make ConstantExprKeyType a class?

llvm/lib/IR/Instructions.cpp
2012

nit: odd newline

ctetreau added inline comments.Jan 24 2020, 8:34 AM
llvm/lib/IR/ConstantsContext.h
149–150

For the time being, Either this assumption must hold, or a bool must be added that is true if the mask is from a scalable vector.

There are places in the codebase that assume that the mask has a concrete value known at compile time. These sites are currently the sites of bugs, and the fix is to not try to access the mask if the vector is scalable. We need some way of knowing that the mask is scalable, either by the assumption made here, or a queryable bool value.

efriedma marked 5 inline comments as done.Jan 24 2020, 10:01 AM

LGTM, but should get a 2nd opinion since I'm not familiar with some of the parts.

Any specific part you're worried about?

llvm/include/llvm/IR/Instructions.h
1991

ShuffleVectorInst::UndefMaskElem is sort of long. Maybe we don't use it in enough places to matter.

llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp
3451–3452

I don't want to make unrelated functional changes in this patch.

llvm/lib/IR/ConstantFold.cpp
886–887

We currently only use this pattern in three places, and I doubt we're going to add more.

llvm/lib/IR/Constants.cpp
2282

Doesn't really matter either way, but we generally don't add const markings to local variables just because we can.

llvm/lib/IR/ConstantsContext.h
149–150

I'd prefer to just say the result is scalable if and only if the operand is scalable, at least for now. It's not clear what semantics we want for a fixed shuffle of a scalable operand, and allowing scalable splats of fixed vectors doesn't really allow expressing anything new. We can relax it later once we've settled on our general approach to scalable shuffles.

LGTM, but should get a 2nd opinion since I'm not familiar with some of the parts.

Any specific part you're worried about?

I don't know anything about the scalable vector enhancements, but that seems covered now. Nuances of the bitcode serialization are beyond me, but I trust you've stepped through that. So, no worries. :)

lattner removed a subscriber: lattner.Jan 24 2020, 7:01 PM
efriedma updated this revision to Diff 241333.Jan 29 2020, 5:47 PM

Address review comments, rebase.

efriedma updated this revision to Diff 252940.Mar 26 2020, 12:22 PM

Rebase. One minor API addition: m_SplatOrUndefMask matcher

efriedma updated this revision to Diff 252996.Mar 26 2020, 3:28 PM

clang-format

Looks like D76649 landed and added another IRBuilder call that needs updating:

[2974/5437] Building CXX object lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o
FAILED: lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o 
CCACHE_DIR=/mnt/disks/ssd0/ccache CCACHE_MAXSIZE=20G CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/bin/clang++  -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/X86 -I/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/X86 -I/usr/include/libxml2 -Iinclude -I/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/include -gmlt -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3  -fvisibility=hidden    -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o -MF lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o.d -o lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o -c /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/X86/X86PartialReduction.cpp
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/X86/X86PartialReduction.cpp:387:22: error: call to member function 'CreateShuffleVector' is ambiguous
    Ops[0] = Builder.CreateShuffleVector(Ops[0], Ops[0], {0, 1});
             ~~~~~~~~^~~~~~~~~~~~~~~~~~~
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/include/llvm/IR/IRBuilder.h:2625:10: note: candidate function
  Value *CreateShuffleVector(Value *V1, Value *V2, ArrayRef<uint32_t> Mask,
         ^
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/include/llvm/IR/IRBuilder.h:2633:10: note: candidate function
  Value *CreateShuffleVector(Value *V1, Value *V2, ArrayRef<int> Mask,
         ^
/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/include/llvm/IR/IRBuilder.h:2618:10: note: candidate function not viable: cannot convert initializer list argument to 'llvm::Value *'
  Value *CreateShuffleVector(Value *V1, Value *V2, Value *Mask,
This revision was automatically updated to reflect the committed changes.