Page MenuHomePhabricator

[SVE] Remove calls to VectorType::getNumElements from IR
ClosedPublic

Authored by ctetreau on Jun 9 2020, 1:54 PM.

Diff Detail

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

The goal for this patch is to remove all invalid calls to VectorType::getNumElements with as few behavior changes as possible. If some code previously called getNumElements via a base VectorType pointer, we are assuming that it meant to treat the vector as a fixed width vector, and are instead unconditionally casting to FixedVectorType. Any test failures that arise from this change are cases where the assumption wasn't correct, and these are being handled on a case by case basis.

We're trying to avoid behavior changes of the form "this branch used to be taken for scalable vectors, now it isn't" because 1) There are a ton of such cases, and if we had to write tests for all of them right now this work would never get done and 2) A lot of these cases are actually impossible to trigger for scalable vectors (as of today), so we can't actually write tests for them. In general anything that has to do with constants is hard to test with scalable vectors because there are only a few scalable vectors that it's actually possible to construct constants for today. This means that some very strange code will result. Things like:

if (auto *VTy = dyn_cast<VectorType>(Ty)) {
   unsinged NumElts = cast<FixedVectorType>(VTy);

If we just dyn_cast to FixedVectorType, then a branch that used to be taken for scalable vectors will now be skipped, and this is a behavior change that requires a new test. It is my hope that this code will look very strange to people familiar with the code in question, and that it will get fixed over time. If this does crash, it will be very obvious that the cast probably isn't correct. People are unlikely to see this and ask themselves "is VTy actually supposed to be a FixedVectorType? Did *I* break this?" We are also intentionally avoiding some "obvious" fixes like auto *VTy = VectorType::get(SomeTy, SomeVTy->getNumElements()) => auto *VTy = VectorType::get(SomeTy, SomeVTy->getElementCount()). While it seems obvious that the second form should be fine, it is technically a behavior change that requires a new test, and this sort of thing happens a lot. As with the dyn_cast issue, if we had to write tests for every instance, this work would never get done.

While reviewing this code, please try to keep this in mind.

The goal for this patch is to remove all invalid calls to VectorType::getNumElements with as few behavior changes as possible. If some code previously called getNumElements via a base VectorType pointer, we are assuming that it meant to treat the vector as a fixed width vector, and are instead unconditionally casting to FixedVectorType. Any test failures that arise from this change are cases where the assumption wasn't correct, and these are being handled on a case by case basis.

We're trying to avoid behavior changes of the form "this branch used to be taken for scalable vectors, now it isn't" because 1) There are a ton of such cases, and if we had to write tests for all of them right now this work would never get done and 2) A lot of these cases are actually impossible to trigger for scalable vectors (as of today), so we can't actually write tests for them. In general anything that has to do with constants is hard to test with scalable vectors because there are only a few scalable vectors that it's actually possible to construct constants for today. This means that some very strange code will result. Things like:

if (auto *VTy = dyn_cast<VectorType>(Ty)) {
   unsinged NumElts = cast<FixedVectorType>(VTy);

If we just dyn_cast to FixedVectorType, then a branch that used to be taken for scalable vectors will now be skipped, and this is a behavior change that requires a new test. It is my hope that this code will look very strange to people familiar with the code in question, and that it will get fixed over time. If this does crash, it will be very obvious that the cast probably isn't correct. People are unlikely to see this and ask themselves "is VTy actually supposed to be a FixedVectorType? Did *I* break this?" We are also intentionally avoiding some "obvious" fixes like auto *VTy = VectorType::get(SomeTy, SomeVTy->getNumElements()) => auto *VTy = VectorType::get(SomeTy, SomeVTy->getElementCount()). While it seems obvious that the second form should be fine, it is technically a behavior change that requires a new test, and this sort of thing happens a lot. As with the dyn_cast issue, if we had to write tests for every instance, this work would never get done.

While reviewing this code, please try to keep this in mind.

This implies behaviour changes for scalable vectors though; code paths that previously worked for scalable vectors will now break.

I'd much prefer us fixing the code for scalable vectors even when there's no test to guard the fixed behaviour, over breaking codepaths for scalable vectors without a test showing what's now broken.
Mostly I'm concerned that this will start breaking things for the ACLE implementation. While the ACLE doesn't rely heavily on common LLVM IR instructions, it does so around a few areas such as GEPs.

It would be nice if we can make the trade-offs a bit more consciously between moving forward with this migration and adding support/test-coverage. I'd like to suggest that:

  • Changes like the ones you made for X86 intrinsics in AutoUpgrade.cpp that can only ever use FixedVectorType, can be changed without needing tests.
  • For code that is unlikely to need scalable-vector support any time soon because it is part of a separate LLVM tool (i.e. cannot be reached through invocation of Clang) or because we know there are no code-paths currently leading down that road for scalable vectors in practice, to use cast<FixedVectorType> + FIXME.
  • For code that is possibly on an execution path for scalable vectors:
    • Make the right behavioural change (fix it to support scalable vectors or bail out early) so that nothing breaks.
    • Try to add a test for the behavioural test if this is feasible.
    • If adding a test is not possible or very convoluted, add a FIXME saying this still needs a test for scalable vectors.

Does that sound like a suitable way forward?

I also understand this is not a one-person job so we're here to help out if you can split off some files/parts we can work on.

david-arm added inline comments.Jun 23 2020, 8:26 AM
llvm/lib/IR/DataLayout.cpp
795 ↗(On Diff #271481)

I already have a fix for this here that deals with scalable vectors and adds a test - https://reviews.llvm.org/D82294.

@sdesmalen I spoke to this a bit in D82211. I agree that these are all morally behavior changes. However, the assumption is that any call to getNumElements that, when cast to FixedVectorType, does not cause a test failure is always a FixedVectorType. That said, there's some subset of these changes that should probably get a real fix. I'm counting on code reviewers to point these out to me.

This implies behaviour changes for scalable vectors though; code paths that previously worked for scalable vectors will now break.

I'd much prefer us fixing the code for scalable vectors even when there's no test to guard the fixed behaviour, over breaking codepaths for scalable vectors without a test showing what's now broken.
Mostly I'm concerned that this will start breaking things for the ACLE implementation. While the ACLE doesn't rely heavily on common LLVM IR instructions, it does so around a few areas such as GEPs.

If some new functionality is added to ACLE that calls an LLVM function that "should just work" for scalable vectors, and it blows up in a cast, it should be very obvious what the problem is. "I know because I am a domain expert that this function just needs the minimum number of elements, I can just change the call to getElementCount." This 1 line change can be made by the developer adding the ACLE feature, and a test can be written then, having a concrete test case.

It would be nice if we can make the trade-offs a bit more consciously between moving forward with this migration and adding support/test-coverage. I'd like to suggest that:

  • Changes like the ones you made for X86 intrinsics in AutoUpgrade.cpp that can only ever use FixedVectorType, can be changed without needing tests.
  • For code that is unlikely to need scalable-vector support any time soon because it is part of a separate LLVM tool (i.e. cannot be reached through invocation of Clang) or because we know there are no code-paths currently leading down that road for scalable vectors in practice, to use cast<FixedVectorType> + FIXME.

I'm reluctant to scatter a bunch of FIXME comments everywhere because a lot of these casts are probably actually correct.

  • For code that is possibly on an execution path for scalable vectors:
    • Make the right behavioural change (fix it to support scalable vectors or bail out early) so that nothing breaks.
    • Try to add a test for the behavioural test if this is feasible.
    • If adding a test is not possible or very convoluted, add a FIXME saying this still needs a test for scalable vectors.

Does that sound like a suitable way forward?

I also understand this is not a one-person job so we're here to help out if you can split off some files/parts we can work on.

My hope is that code reviewers will point out cases where there's a correct fix with a reasonable test case. That said, I think there is value in completing the removal of calls to getNumElements as soon as possible:

  1. Every day, new usages are being added. I spend way to much of my day just rebasing my changes and playing whack-a-mole. We can shout on the mailing lists that this is happening, but until people get that deprecated warning, they will keep on as they always have.
  2. The release of LLVM 11 is coming. We have promised to just deprecate these functions rather than remove them until the next release. If we don't make LLVM 11, then we have to wait until LLVM 12.

There is obviously still work to be done, and surely a ton of fallout once the switch gets flipped. I would like to complete the deprecation of VectorType::getNumElements as soon as possible so we can handle as much of that as possible before the release.

ctetreau updated this revision to Diff 272806.Jun 23 2020, 1:07 PM

rebase on top of D82294

ctetreau updated this revision to Diff 273441.Jun 25 2020, 10:20 AM

catch straggler

efriedma added inline comments.Jun 25 2020, 11:04 AM
llvm/include/llvm/IR/GetElementPtrTypeIterator.h
86

cast to FixedVectorType?

llvm/lib/IR/Function.cpp
1338

This doesn't look right.

ctetreau updated this revision to Diff 274229.Jun 29 2020, 1:46 PM

address code review issues

ctetreau updated this revision to Diff 274271.Jun 29 2020, 3:47 PM

catch straggler

RKSimon added inline comments.Jul 9 2020, 1:47 AM
llvm/include/llvm/IR/MatrixBuilder.h
47

Do we not have a method for creating splat vectors that works for fixed and non-fixed vector types?

RKSimon added inline comments.Jul 23 2020, 6:03 AM
llvm/include/llvm/IR/MatrixBuilder.h
47

I think CreateVectorSplat needs to be altered to take a ElementCount type instead of a raw unsigned elt count - are you in a position to do this first?

ctetreau marked an inline comment as done.Jul 23 2020, 12:53 PM
ctetreau added inline comments.
llvm/include/llvm/IR/MatrixBuilder.h
47

The change to CreateVectorSplat is actually fairly trivial. ElementCount can just be used in place of an unsigned within its body and signature. I can certainly make the change, but I'd prefer to do it separate from this patch because:

  1. it's a behavior change that needs tests
  2. as far as I can tell, none of this matrix stuff supports scalable vectors, so making this change would have low practical value
ctetreau updated this revision to Diff 285199.Aug 12 2020, 3:51 PM

catch straggler

ctetreau updated this revision to Diff 285200.Aug 12 2020, 3:54 PM

clang-format

a few minors - you might also want to keep an eye on D85794

llvm/lib/IR/Constants.cpp
233–234

I think you can change the VTy cast without changing functionality here.

310–311

I think you can change the VTy cast without changing functionality here.

321–322

I think you can change the VTy cast without changing functionality here.

llvm/lib/IR/Verifier.cpp
5159–5160

I think you can lose the cast<VectorType> lines and not lose any functionality here.

ctetreau marked 8 inline comments as done.Aug 25 2020, 11:34 AM
ctetreau added inline comments.
llvm/include/llvm/IR/MatrixBuilder.h
47

I'm going to go ahead and use your new overload that takes an ElementCount. I'm going to assert that the vector is not scalable though to ensure that this is NFC.

llvm/lib/IR/Constants.cpp
233–234

ConstantAggregateZero and UndefValue can have ScalableVectorType, but since this function explicitly returns false for 0, and the cast will fail for UndeValue, also producing false, I agree with your statement.

I went ahead and changed the comment to mention it only returns true for fixed width vectors

310–311

this function previously returned true if the entire vector was undef, and dyn_cast'ing to FixedVectorType will change this behavior. I will make this function robust to this case though, and update the comment to mention that this is the one case where this will return true for scalable vector constants

I guess technically, getAggregateElement for a scalable UndefValue asserts, but for the purposes of this function, it all should work out since the caller isn't providing an index, and UndefValue :: <vscale x n x ty> does actually contain UndefValue.

321–322

Since this will never actually return true for a scalable vector, I agree. I'll update the comment document that it only returns true for fixed width vectors

llvm/lib/IR/Function.cpp
1338

fixed in a previous patch.

ctetreau updated this revision to Diff 287724.Aug 25 2020, 11:35 AM
ctetreau marked 4 inline comments as done.

rebase, address code review issues

@sdesmalen, I just wanted to touch bases on this. My understanding is that your concerns with this patch were largely driven by the fact that you and your team were crunching to get ACLE in master prior to the LLVM 11 branch point, and that by adding a bunch of assertion failures, my work was making yours more difficult. However, now that the branch point has come and gone, these concerns are hopefully lessened. If you have any concrete improvements to this, or any of the other patches I have up to remove calls to getNumElements(), feel free to let me know. Otherwise, I would really appreciate if we could get these patches reviewed and merged so I can stop playing whack-a-mole with this function :)

As mentioned, I acknowledge that on some level, substituting a miscompile with a crash is actually a behavior change. However, my goal has been to prevent different control flow paths from being taken after the point of the bug. At the point of a call to getNumElements(), either the compiler will crash if it gets a scalable vector, or it will behave the same as it did before. A function that used to return true should not start returning false without a test case. If I have failed in this goal, any reviewer should feel free to point it out and I will fix it.

RKSimon accepted this revision.Aug 26 2020, 2:03 AM

LGTM - but please be on the look out to help with assertions raised by theis series of changes - the oss fuzzer in particular will find many of these: https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=25165

llvm/lib/IR/IRBuilder.cpp
526

please can you make these auto* to avoid a clang-tidy warning

556

please can you make these auto* to avoid a clang-tidy warning

This revision is now accepted and ready to land.Aug 26 2020, 2:03 AM
ctetreau updated this revision to Diff 288032.Aug 26 2020, 10:22 AM

address code review issues

ctetreau updated this revision to Diff 288135.Aug 26 2020, 4:44 PM

fix tests

This revision was landed with ongoing or failed builds.Aug 27 2020, 11:16 AM
This revision was automatically updated to reflect the committed changes.