Instead, represent the mask as out-of-line data in the instruction. This should be more efficient in the places that currently use getShuffleVector(), and paves the way for further changes to add new shuffles for scalable vectors. This doesn't change the syntax in textual IR. And I don't currently plan to change the bitcode encoding in this patch, although we'll probably need to do something once we extend shufflevector for scalable types. I expect that once this is finished, we can then replace the raw "mask" with something more appropriate for scalable vectors. Not sure exactly what this looks like at the moment, but there are a few different ways we could handle it. Maybe we could try to describe specific shuffles. Or maybe we could define it in terms of a function to convert a fixed-length array into an appropriate scalable vector, using a "step", or something like that. The big question here, of course, is "do we want to do this?", and if we do, "do we want to do this now?". For the immediate needs of SVE, we don't need fully general shuffles; we have intriniscs for the native instructions, and we can add additional intrinsics or intrinsic overloads for the various "extended" patterns we expect to need in the immediate future that don't involve native SVE vectors, like concatenating two "<vscale x 2 x i1>" vectors. The biggest risk here is we end up essentially throwing this work away when we add another target that supports vscale vectors.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
The big question here, of course, is "do we want to do this?", and if we do, "do we want to do this now?".
Speaking for myself, "yes" and "sooner is better, we should have done this years ago" :-)
Thank you for pushing on this!
llvm/include/llvm/Analysis/ConstantFolding.h | ||
---|---|---|
126 | Please update the comment to indicate what non-positive values are allowed etc. |
This change would also remove the logical gymnastics needed to give special meaning to the mask with respect to undef/poison:
http://llvm.org/docs/LangRef.html#shufflevector-instruction
If the mask not an operand, then it's easier to explain that it doesn't follow the typical operand rules, so +1 for that benefit too.
Add more comments, fixed the GlobalISel representation of shuffles, misc other cleanups. Still haven't addressed the bitcode reader/writer issues.
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4446 | maybe introduce new helper function to check if mask is "undef"? |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4446 | I could introduce ShuffleVectorInst::MaskIsAllUndef and ShuffleVectorInst::MaskIsAllZero, I guess. Not sure how helpful that actually is for a predicate that can be expressed on one line anyway. |
Figured out the changes necessary to get bitcode reading/writing working. ninja check now passes.
Not sure what parts I should split off to get reviewed separately. Probably the GlobalISel changes could be split off without adding too much complexity; not sure what else.
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4446 | How about introducing the constant UndefMaskElem and changing this to something like: if (all_of(Mask, [](int Elem) { return Elem == UndefMaskElem; })) That would make it self-explanatory again and wouldn't expose the specific encoding of undef mask elems. | |
llvm/lib/IR/Instructions.cpp | ||
1918–1919 | With the suggestion: NewMask[i] = UndefMaskElem |
llvm/lib/Analysis/InstructionSimplify.cpp | ||
---|---|---|
4446 | You mean just llvm::UndefMaskElem? I guess that makes sense. |
Rebase. Address review comment. More work on bitcode.
I probably need to come up with some testcases for the bitcode encode/decode; we currently don't have great coverage. Otherwise, I think this patch is complete.
Fixed one more issue. While I'm at it, get rid of the old overload of ConstantExpr::getShuffleVector, which only had a few remaining uses.
llvm/include/llvm/IR/Constants.h | ||
---|---|---|
1229 | an shufflevector -> a shufflevector | |
1233 | an shufflevector -> a shufflevector | |
llvm/include/llvm/IR/IRBuilder.h | ||
2621 | Add an assert that Mask isa<Constant> ? | |
llvm/include/llvm/IR/Instructions.h | ||
1996–1997 | This reads awkwardly to me (if you agree, we can update the LangRef too). | |
2024–2025 | This comment doesn't add value to me, so I'd just delete it. If we want to keep it, should fix it to be a proper sentence with a capital letter and period. | |
llvm/include/llvm/IR/PatternMatch.h | ||
1357–1415 | Update/add comment: | |
1384 | IIUC, this is meant to replace the current uses of m_Zero() or m_ZeroInt() with shuffle pattern matching, but there's a logic difference because those matchers allow undefs, but this does not? Are we missing unittests to catch that case? | |
llvm/lib/Bitcode/Writer/ValueEnumerator.cpp | ||
939–941 | Indentation looks off-by-1. | |
llvm/lib/CodeGen/GlobalISel/IRTranslator.cpp | ||
1977 | if (auto *SVI = ...) | |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
3441 | if (auto *SVI = ...) | |
llvm/lib/ExecutionEngine/Interpreter/Execution.cpp | ||
1890 | I've never looked in here before - what happens currently if the index is -1 (undef)? | |
llvm/lib/IR/ConstantsContext.h | ||
163–164 | This comment doesn't add value to me, so I'd just delete it. If we want to keep it, should fix it to be "two operands" and a proper sentence with a capital letter and period. | |
582 | (not familiar with this part) /// Return true if this is an insertvalue or extractvalue expression, /// and the getIndices() method may be used. | |
llvm/lib/IR/Instruction.cpp | ||
446 | Would have suggested "auto *" here, but really the whole thing should be turned into a switch() as a preliminary cleanup? | |
llvm/lib/IR/Instructions.cpp | ||
1897–1898 | I see this is copied, but indentation seems off. | |
llvm/lib/Transforms/Scalar/GVN.cpp | ||
302 | Use "auto *" in each clause (or convert to switch). |
llvm/include/llvm/IR/IRBuilder.h | ||
---|---|---|
2621 | cast<T> already does this assert. | |
2628 | [0 .. uint32_MAX] is wider than [0 .. int_MAX]. Assert the values are in range somehow? | |
llvm/include/llvm/IR/Instructions.h | ||
1996–1997 | I also found the wording of the description of shufflevector in the language ref confusing. Since this description seems to be related I thought it worth mentioning. It's unclear to me how the values specify vector inputs. I assume for two vectors <4 x 18>, that the mask <3, 4, 5, 6> takes the last element of the first vector, and the first three elements of the second vector, but this isn't explicitly stated anywhere. |
llvm/include/llvm/IR/IRBuilder.h | ||
---|---|---|
2628 | Either the resulting shuffle mask is invalid (and we'll assert later), or the input contained UINT_MAX, in which case we'll treat it as undef. | |
llvm/include/llvm/IR/Instructions.h | ||
2024–2025 | I think the comment is there to identify the magic number. Maybe I should introduce a named constant ShuffleVectorInst::NumOperands to make that clear. | |
llvm/include/llvm/IR/PatternMatch.h | ||
1384 | I guess we're missing tests, yes; I didn't see any failures. | |
llvm/lib/ExecutionEngine/Interpreter/Execution.cpp | ||
1890 | The interpreter evaluates undef to zero, and therefore it behaves as if every undef is replaced with zero. Using std::max here preserves the existing behavior. | |
llvm/lib/IR/ConstantsContext.h | ||
582 | As part of constructing ConstantExprs, there's a map from ConstantExprKeyType to ConstantExpr, to ensure each expression is unique. "Indexes" is just a list of integers, so I was reusing it as a shortcut, to try to avoid making an extra ArrayRef. This only impacts the map itself, not the final ConstantExprs. It looks like I missed that a few places use hasIndices() to check whether the ConstantExprKeyType key stores data in Indexes. I should probably just add the extra member variable to make the logic clear, even if it wastes a tiny bit of stack space. |
Addressed review comments. Rebased.
Chris Tetreault mentioned offline that we might want to change the return value of getShuffleMask() now, to try to save some work later when we add more forms of scalable vector shuffle. (Not actually changing the functionality of any code, just pre-emptively changing the types so we don't have to change code again if it doesn't actually examine the shuffle mask directly.) I'll look into it.
LGTM, but should get a 2nd opinion since I'm not familiar with some of the parts.
Also, please update the related LangRef text for shufflevector in or alongside this patch.
Could add some ascii art to visualize the mask indexing as requested by @ctetreau ?
shufflevector <3 x float> %x, <3 x float> %y, <n x i32> %mask: +---+---+---+---+---+---+ | x | y | vector operands +---+---+---+---+---+---+ | 0 | 1 | 2 | 0 | 1 | 2 | per-operand indexing +---+---+---+---+---+---+ | 0 | 1 | 2 | 3 | 4 | 5 | shuffle mask indexing +---+---+---+---+---+---+
llvm/include/llvm/IR/Instructions.h | ||
---|---|---|
2047–2048 | Can replace the magic -1 with the named "UndefMaskElem" in this comment. | |
2057 | Can replace the magic -1 with the named "UndefMaskElem" in this comment. |
Thanks @efriedma for working on this. The overall approach seems good to me!
Mostly added some nits, with the exception of one question on what to do with a shufflevector with fixed-width mask and scalable source vectors.
llvm/include/llvm/IR/Instructions.h | ||
---|---|---|
1991 | nit: do we want to make this a member of ShuffleVectorInst, as this is likely the only context in which it will be used? | |
llvm/lib/Bitcode/Writer/BitcodeWriter.cpp | ||
2705–2706 | nit: >80char (please run through clang-format) | |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
3451–3452 | Should this use m_ZeroMask() ? | |
llvm/lib/IR/ConstantFold.cpp | ||
886–887 | Is it worth adding a m_UndefMask ? | |
llvm/lib/IR/Constants.cpp | ||
2282 | nit: is there a reason for dropping const here? | |
llvm/lib/IR/ConstantsContext.h | ||
149–150 | The number of elements in the result matches the number of elements in the mask, but if we assume 'scalable' from one of the source vectors, this means we make the choice that we cannot extract a fixed-width vector from a scalable vector, e.g. shufflevector(<vscale x 4 x i32> %in, <vscale x 4 x i32> undef, <4 x i32> <i32 0, i32 1, i32 2, i32 3>) The LangRef does not explicitly call out this case (and it currently fails to compile), but it would provide a way to extract a sub-vector from a scalable vector. shufflevector(<vscale x 4 x i32> %in, <vscale x 4 x i32> undef, <4 x i32> <i32 5, i32 6, i32 7, i32 8>) which may not be valid if vscale = 1. Alternatively, we could implement this with an additional extract intrinsic and add the restriction that if the source values are scalable, the mask must be scalable as well. If so, then we should update the LangRef to explicitly disallow the above case. | |
467 | nit: make ConstantExprKeyType a class? | |
llvm/lib/IR/Instructions.cpp | ||
2012 | nit: odd newline |
llvm/lib/IR/ConstantsContext.h | ||
---|---|---|
149–150 | For the time being, Either this assumption must hold, or a bool must be added that is true if the mask is from a scalable vector. There are places in the codebase that assume that the mask has a concrete value known at compile time. These sites are currently the sites of bugs, and the fix is to not try to access the mask if the vector is scalable. We need some way of knowing that the mask is scalable, either by the assumption made here, or a queryable bool value. |
Any specific part you're worried about?
llvm/include/llvm/IR/Instructions.h | ||
---|---|---|
1991 | ShuffleVectorInst::UndefMaskElem is sort of long. Maybe we don't use it in enough places to matter. | |
llvm/lib/CodeGen/SelectionDAG/SelectionDAGBuilder.cpp | ||
3451–3452 | I don't want to make unrelated functional changes in this patch. | |
llvm/lib/IR/ConstantFold.cpp | ||
886–887 | We currently only use this pattern in three places, and I doubt we're going to add more. | |
llvm/lib/IR/Constants.cpp | ||
2282 | Doesn't really matter either way, but we generally don't add const markings to local variables just because we can. | |
llvm/lib/IR/ConstantsContext.h | ||
149–150 | I'd prefer to just say the result is scalable if and only if the operand is scalable, at least for now. It's not clear what semantics we want for a fixed shuffle of a scalable operand, and allowing scalable splats of fixed vectors doesn't really allow expressing anything new. We can relax it later once we've settled on our general approach to scalable shuffles. |
I don't know anything about the scalable vector enhancements, but that seems covered now. Nuances of the bitcode serialization are beyond me, but I trust you've stepped through that. So, no worries. :)
Looks like D76649 landed and added another IRBuilder call that needs updating:
[2974/5437] Building CXX object lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o FAILED: lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o CCACHE_DIR=/mnt/disks/ssd0/ccache CCACHE_MAXSIZE=20G CCACHE_CPP2=yes CCACHE_HASHDIR=yes /usr/bin/ccache /usr/bin/clang++ -DGTEST_HAS_RTTI=0 -D_DEBUG -D_GNU_SOURCE -D__STDC_CONSTANT_MACROS -D__STDC_FORMAT_MACROS -D__STDC_LIMIT_MACROS -Ilib/Target/X86 -I/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/X86 -I/usr/include/libxml2 -Iinclude -I/mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/include -gmlt -fPIC -fvisibility-inlines-hidden -Werror=date-time -Werror=unguarded-availability-new -Wall -Wextra -Wno-unused-parameter -Wwrite-strings -Wcast-qual -Wmissing-field-initializers -pedantic -Wno-long-long -Wimplicit-fallthrough -Wcovered-switch-default -Wno-noexcept-type -Wnon-virtual-dtor -Wdelete-non-virtual-dtor -Wstring-conversion -fdiagnostics-color -ffunction-sections -fdata-sections -O3 -fvisibility=hidden -fno-exceptions -fno-rtti -UNDEBUG -std=c++14 -MD -MT lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o -MF lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o.d -o lib/Target/X86/CMakeFiles/LLVMX86CodeGen.dir/X86PartialReduction.cpp.o -c /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/X86/X86PartialReduction.cpp /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/lib/Target/X86/X86PartialReduction.cpp:387:22: error: call to member function 'CreateShuffleVector' is ambiguous Ops[0] = Builder.CreateShuffleVector(Ops[0], Ops[0], {0, 1}); ~~~~~~~~^~~~~~~~~~~~~~~~~~~ /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/include/llvm/IR/IRBuilder.h:2625:10: note: candidate function Value *CreateShuffleVector(Value *V1, Value *V2, ArrayRef<uint32_t> Mask, ^ /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/include/llvm/IR/IRBuilder.h:2633:10: note: candidate function Value *CreateShuffleVector(Value *V1, Value *V2, ArrayRef<int> Mask, ^ /mnt/disks/ssd0/agent/workspace/amd64_debian_testing_clang8/llvm/include/llvm/IR/IRBuilder.h:2618:10: note: candidate function not viable: cannot convert initializer list argument to 'llvm::Value *' Value *CreateShuffleVector(Value *V1, Value *V2, Value *Mask,
(Also merged followup https://github.com/llvm/llvm-project/commit/ba4764c2cc14b0b495af539a913de10cf8268420 to fix a memory leak caught by the bots.)
Please update the comment to indicate what non-positive values are allowed etc.