This is an archive of the discontinued LLVM Phabricator instance.

[InstCombine] Add fold for extracting known elements from a stepvector
ClosedPublic

Authored by CarolineConcatto on May 26 2021, 4:02 AM.

Details

Summary

This patch allows folding stepvector + extract to the lane when the lane is
lower than the minimum size of the scalable vector. This fold is possible
because lane X of a stepvector is also X!
For instance, extracting element 3 of a <vscale x 4 x i64>stepvector is 3.

Diff Detail

Event Timeline

CarolineConcatto requested review of this revision.May 26 2021, 4:02 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 26 2021, 4:02 AM
foad added inline comments.May 26 2021, 4:08 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
354

Should be "<" not "<=", surely?

foad added inline comments.May 26 2021, 4:14 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
351

You don't need this "if".

354

IndexC->getValue().ult(NumElts) seems safer, since getZExtValue can truncate wide values.

356

Don't need the case to VectorType since getScalarType is defined on Type.

357

Just use getValue here.

CarolineConcatto marked an inline comment as done.

-address Jay Foad's comments

Thank you @foad for your fast review. I've tried to address all of them.
I believe one of the changes you've asked I could not apply.

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
351

Ok, Indeed I don't need it.
I was adding because I thought it could make it less prone to crash in case it was not an intrinsic call, as I see similar tests being done in other parts of the code.
But I agree that because the first if should be enough

354

Hi @foad,
Yes, you are correct! I left it there by mistake.
Thank you for pointing it out.

354

Thank you @foad!
I did not know I could use that.
I believe I can use this in the other patch that I need to do the same test.

357

It crashes when I replace
IndexC->getZExtValue() by IndexC->getValue()

Assertion `C->getType() == Ty->getScalarType() && "ConstantInt type doesn't match the type implied by its value!"' failed.
CarolineConcatto marked 3 inline comments as done.May 26 2021, 4:51 AM
david-arm added inline comments.May 26 2021, 6:31 AM
llvm/test/Transforms/InstCombine/vscale_extractelement.ll
189

Maybe it's useful to have one other positive test with a non-zero lane index, i.e 2 or 3?

foad added inline comments.May 26 2021, 6:38 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
357

OK, it's a bit more complicated if the index type does not match the vector element type. You also need to give up if the index value is too large to fit in the vector element type (this is the case where the langref says "If the sequence value exceeds the allowed limit for the element type then the result for that lane is undefined").

How about something like:

Type *Ty = EI.getType();
unsigned BitWidth = Ty->getIntegerBitWidth();
if (IndexC->getValue().getActiveBits() > BitWidth)
  break; // with a suitable comment
auto *Idx = ConstantInt::get(Ty, IndexC->getValue().zextOrTrunc(BitWidth));

You should also add at least one test where the extractelement index type is wider than the stepvector element type.

Matt added a subscriber: Matt.May 26 2021, 9:24 AM
CarolineConcatto marked an inline comment as done.

--address Jay Foad's comments

llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
357

Hi @foad,
I have tried a combination of your suggestion, because as is it did not worked properly.
1)Replacing the tests and updating the Idx to use getValue
When I replace:

if (IndexC->getValue().ult(NumElts)). //test 1

by

if (IndexC->getValue().getActiveBits() > BitWidth) //test2

The invalid test (ext_lane_invalid_from_stepvec) and out of range( ext_lane_out_of_range_from_stepvec) return some values, garbage and 5.

2)Having the original test, but updating the Idx to use getValue
And when I leave the first test(test 1), the second test (test 2) is not needed.

I added the test you suggested too when the index is wider than the vector element and works.

Let me know if that is ok and is what you had in mind.

sdesmalen added inline comments.Jun 3 2021, 12:50 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
355

It seems a bit silly to have a switch statement to handle a single case, if you can write:

if (IID == Intrinsic::experimental_stepvector && IndexC->getValue().ult(NumElts)) {
  ...
}
359

nit: s/!IndexC->getValue().ult/IndexC->getValue().uge/

362

nit: this only has a single use and the RHS expression is trivial, so the variable seems redundant.

CarolineConcatto marked 3 inline comments as done.
  • Remove switch case as suggested
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
355

Yes, for now, it may seem silly, but if we decide to add another intrinsic check this will come in handy.
But I've changed to follow to police: don't write code you don't need now, despite the fact the I still like my implementation.

359

This is not needed anymore.

foad added inline comments.Jun 3 2021, 8:22 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
357

You can see the problem with a test case like this:

define i8 @ext_lane300_from_stepvec() {
; CHECK-LABEL: @ext_lane300_from_stepvec(
; CHECK-NEXT:  entry:
; CHECK-NEXT:    ret i8 44
;
entry:
  %0 = call <vscale x 512 x i8> @llvm.experimental.stepvector.nxv512i8()
  %1 = extractelement <vscale x 512 x i8> %0, i64 300
  ret i8 %1
}

Currently you optimize the result to 44, which is wrong. You should either refuse to optimize, or optimize it to undef ("If the sequence value exceeds the allowed limit for the element type then the result for that lane is undefined").

I was trying to suggest something like:

if (IndexC->getValue().ult(NumElts)) {
  if (IndexC->getValue().getActiveBits() > BitWidth) {
    // IndexC can't be safely truncated to BitWidth bits
    Idx = UndefValue::get(Ty); // or just give up?
  } else {
    Idx = ConstantInt::get(Ty, IndexC->getValue().zextOrTrunc(BitWidth));
  }
}
llvm/test/Transforms/InstCombine/vscale_extractelement.ll
250–251

Don't need ; after a declaration. In LLVM IR ; starts a comment.

  • Add check when the index uses more bits than the number of bit available in the vector scalar type
CarolineConcatto marked 2 inline comments as done.Jun 8 2021, 5:06 AM
CarolineConcatto added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
357

Hi @foad,
Thank you for pointing that problem again to me.
I did miss that before, sorry.
I added new tests to cover this now.

foad added inline comments.Jun 8 2021, 5:33 AM
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
361

All these quantities are unsigned so there is no need to subtract 1 for the sign bit.

llvm/test/Transforms/InstCombine/vscale_extractelement.ll
259

This example *should* be optimised to ret i8 128. stepvector generates unsigned value, so i8 128 is fine (it does not mean -128).

CarolineConcatto marked an inline comment as done.
  • Address Foad's comments
CarolineConcatto marked 2 inline comments as done.Jun 8 2021, 10:15 AM
CarolineConcatto added inline comments.
llvm/lib/Transforms/InstCombine/InstCombineVectorOps.cpp
361

LLVM-IR always surprising me.
I was surprised when the test replaced 255 by -1.
But if llvm-ir does not reserve 1 bit for sign I accept that -1 is equal to 255.

foad accepted this revision.Jun 8 2021, 10:40 AM

LGTM, thanks!

This revision is now accepted and ready to land.Jun 8 2021, 10:40 AM