This patch allows folding stepvector + extract to the lane when the lane is
lower than the minimum size of the scalable vector. This fold is possible
because lane X of a stepvector is also X!
For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
354 | Should be "<" not "<=", surely? |
Thank you @foad for your fast review. I've tried to address all of them.
I believe one of the changes you've asked I could not apply.
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
351 | Ok, Indeed I don't need it. | |
354 | Hi @foad, | |
354 | Thank you @foad! | |
357 | It crashes when I replace Assertion `C->getType() == Ty->getScalarType() && "ConstantInt type doesn't match the type implied by its value!"' failed. |
llvm/test/Transforms/InstCombine/vscale_extractelement.ll | ||
---|---|---|
189 | Maybe it's useful to have one other positive test with a non-zero lane index, i.e 2 or 3? |
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
357 | OK, it's a bit more complicated if the index type does not match the vector element type. You also need to give up if the index value is too large to fit in the vector element type (this is the case where the langref says "If the sequence value exceeds the allowed limit for the element type then the result for that lane is undefined"). How about something like: Type *Ty = EI.getType(); unsigned BitWidth = Ty->getIntegerBitWidth(); if (IndexC->getValue().getActiveBits() > BitWidth) break; // with a suitable comment auto *Idx = ConstantInt::get(Ty, IndexC->getValue().zextOrTrunc(BitWidth)); You should also add at least one test where the extractelement index type is wider than the stepvector element type. |
--address Jay Foad's comments
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
357 | Hi @foad, if (IndexC->getValue().ult(NumElts)). //test 1 by if (IndexC->getValue().getActiveBits() > BitWidth) //test2 The invalid test (ext_lane_invalid_from_stepvec) and out of range( ext_lane_out_of_range_from_stepvec) return some values, garbage and 5. 2)Having the original test, but updating the Idx to use getValue I added the test you suggested too when the index is wider than the vector element and works. Let me know if that is ok and is what you had in mind. |
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
355 | It seems a bit silly to have a switch statement to handle a single case, if you can write: if (IID == Intrinsic::experimental_stepvector && IndexC->getValue().ult(NumElts)) { ... } | |
359 | nit: s/!IndexC->getValue().ult/IndexC->getValue().uge/ | |
362 | nit: this only has a single use and the RHS expression is trivial, so the variable seems redundant. |
- Remove switch case as suggested
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
355 | Yes, for now, it may seem silly, but if we decide to add another intrinsic check this will come in handy. | |
359 | This is not needed anymore. |
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
357 | You can see the problem with a test case like this: define i8 @ext_lane300_from_stepvec() { ; CHECK-LABEL: @ext_lane300_from_stepvec( ; CHECK-NEXT: entry: ; CHECK-NEXT: ret i8 44 ; entry: %0 = call <vscale x 512 x i8> @llvm.experimental.stepvector.nxv512i8() %1 = extractelement <vscale x 512 x i8> %0, i64 300 ret i8 %1 } Currently you optimize the result to 44, which is wrong. You should either refuse to optimize, or optimize it to undef ("If the sequence value exceeds the allowed limit for the element type then the result for that lane is undefined"). I was trying to suggest something like: if (IndexC->getValue().ult(NumElts)) { if (IndexC->getValue().getActiveBits() > BitWidth) { // IndexC can't be safely truncated to BitWidth bits Idx = UndefValue::get(Ty); // or just give up? } else { Idx = ConstantInt::get(Ty, IndexC->getValue().zextOrTrunc(BitWidth)); } } | |
llvm/test/Transforms/InstCombine/vscale_extractelement.ll | ||
250–251 | Don't need ; after a declaration. In LLVM IR ; starts a comment. |
- Add check when the index uses more bits than the number of bit available in the vector scalar type
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
361 | All these quantities are unsigned so there is no need to subtract 1 for the sign bit. | |
llvm/test/Transforms/InstCombine/vscale_extractelement.ll | ||
259 | This example *should* be optimised to ret i8 128. stepvector generates unsigned value, so i8 128 is fine (it does not mean -128). |
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp | ||
---|---|---|
361 | LLVM-IR always surprising me. |
You don't need this "if".