This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Enable scalable vectors handling in computeKnownBits
AbandonedPublic

Authored by dmgreen on Jun 20 2022, 12:32 AM.

Details

Summary

The computeKnownBits functions do not currently handle known bits of scalable vectors, as the meaning of DemandedElts is unclear for a vector that is scalable. This changes that so that the majority of knownBits is run, with the check for scalable vectors being pushed into the individual cases that a not trivially lane-wise independent. The DemandedElts size is set to getVectorMinNumElements and asserted to be all-ones, although this patch doesn't make a strong opinion as to what that means, exactly. Nothing currently makes use of it in a way that varies between elements.

In order to keep the tests working this also:

  • Adds known bits for ISD::SPLAT_VECTOR, which is the same bits as the input element (possibly truncated).
  • Adds some simple known bits for ISD::STEP_VECTOR, which for power-two increments we know lower bit zeroes.
  • The patterns for INDEX were made to work with adds or add-like-ors, to keep them working as intended.

Diff Detail

Event Timeline

dmgreen created this revision.Jun 20 2022, 12:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2022, 12:32 AM
dmgreen requested review of this revision.Jun 20 2022, 12:32 AM
Herald added a project: Restricted Project. · View Herald TranscriptJun 20 2022, 12:32 AM

SGTM but I'll defer to people who are actively working with scalable vectors whether the getVectorMinNumElements approach will work.

I had been wondering whether we're going have to move from APInt to some kind of 'DemandedMask' wrapper class to handle everything,

llvm/lib/Target/AArch64/AArch64InstrFormats.td
61 ↗(On Diff #438268)

Just move this as a pre-commit NFC?

I don't like the path this patch is taking. In it current form it is changing what the current interface means without documenting such, nor paying much heed to what all the code paths might do. This is why during initial scalable vector bring up, we decided to go the safest route and essentially bail for scalable vectors. Sure this means we implicitly changed the meaning of the "all known" value but there was little else we could do.

It was discussed during the SVE sync calls we ran a while back and whilst no firm decision was made it was pretty clear that KnownBits will need to undergo the same transition as TypeSize and ElementCount. I don't think this is something we can nor should short cut.

dmgreen updated this revision to Diff 438728.Jun 21 2022, 9:27 AM
dmgreen marked an inline comment as done.
dmgreen edited the summary of this revision. (Show Details)

For now this shores up the meaning of DemandedElts a little - making it always getVectorMinNumElements and assert that it is always all-ones.

I have versions of this patch for ComputeNumSignBits and SimplifyDemandedBits (and some patches on top), but have not sent them for review until we can work through how DemandedElts would work for scalable vectors.

I don't like the path this patch is taking. In it current form it is changing what the current interface means without documenting such, nor paying much heed to what all the code paths might do. This is why during initial scalable vector bring up, we decided to go the safest route and essentially bail for scalable vectors. Sure this means we implicitly changed the meaning of the "all known" value but there was little else we could do.

It was discussed during the SVE sync calls we ran a while back and whilst no firm decision was made it was pretty clear that KnownBits will need to undergo the same transition as TypeSize and ElementCount. I don't think this is something we can nor should short cut.

I'm not sure that's entirely necessary in order to get a lot of good improvement out of computeKnownBits/computeSignBits/SimplifyDemandedBits. A large part of the optimizations they perform are identical for all lanes, and with scalable vectors that is much more common than demanded elt's that vary by lane. This would at least be a nice first step forward without trying to solve everything all at once.

What kind of optimizations would you hope to be solved with scalable demanded elts? That might help define what kind of interface it would need.

RKSimon added inline comments.Jun 21 2022, 9:45 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2906–2907

should we assert here?

assert((!Op.getValueType().isScalableVector() || DemandedElts.isAllOnes()) && "All scalable vector elements must be demanded");
david-arm added inline comments.Jun 22 2022, 1:23 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3462

I'm surprised we need so few bail-outs for all the opcodes for scalable vectors? For example, ISD::INTRINSIC_VOID, etc. call into target functions that may not yet deal with this properly. @paulwalker-arm is right that there was discussion about this during SVE sync calls with @efriedma and Chris Tetreault and I remember distinctly that we were steered away (for what seemed like sensible reasons) from this approach. It's certainly worth having this discussion again now though. I think part of the problem is that you have to make sure every existing, and new, opcode takes sensible action for scalable vectors. Without a new interface to make it clear that DemandedElts now work with scalable vectors there is possibly a danger of introducing subtle bugs without realising it. At the very least it might be safer to explicitly call out all supported opcodes at the start of the function and bail out for any supported ones, before even getting to the switch statement?

dmgreen added inline comments.Jun 23 2022, 9:39 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3462

When I looked through all the opcodes I had not see anything that would cause problems, including into the AArch64 and RISCV computeKnownBitsForTargetNode methods. Did you have anything in particular that you think would be a problem?

Most interesting operations are lane-wise-identical, just passing DemandedElts through unchanged to other calls. That is the part that gives us a lot of benefit for SVE.

david-arm added inline comments.Jun 24 2022, 1:17 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
3462

It's more that in general not teaching users that DemandedElts for this function now works for scalable vectors could lead to the silent introduction of bugs. For example, does it only refer to the first known minimum number of elements, or does it refer to a repeating pattern of elements for each chunk? I think we should define explicitly what this means.

Also, here you're only changing the meaning of DemandedElts for one function, but there are lots of functions scattered throughout codegen and IR that have similar interfaces that take a DemandedElts. So there is now an inconsistency and is potentially confusing for users as to what it means. A quick grep of llvm/include/llvm showed around 65 declared functions that take a parameter of exactly the same name.

SelectionDAG::computeKnownBits is called by SelectionDAG::MaskedValueIsZero, SelectionDAG::MaskedVectorIsZero, TargetLowering::SimplifyMultipleUseDemandedBits, etc, which also take a DemandedElts parameter and pass it straight on to computeKnownBits. However, the comments on those interfaces haven't been updated to explain they now work for scalable vectors. Rather than go to the trouble of updating the comments on all those functions, I personally think it's easier to just create a new simple wrapper class to represent demanded elements in a similar way to what we did for ElementCount and TypeSize.

Matt added a subscriber: Matt.Jun 28 2022, 2:23 PM

Sorry for the delay - I became busy with some other things. I think I might put up the other patches I have so we can all take a look, as I don't see any huge reasons why not to go with this. We (for the moment) define DemandedElts for scalable vectors that all lanes must be demanded, which we represent as a DemandElts mask of VectorMinNumElements with all ones, and have asserts to make sure that is always true. It is simple, efficient, and gets us a lot of the benefits of KnownBits/SimplifyDemandedBits.

It's more that in general not teaching users that DemandedElts for this function now works for scalable vectors could lead to the silent introduction of bugs. For example, does it only refer to the first known minimum number of elements, or does it refer to a repeating pattern of elements for each chunk? I think we should define explicitly what this means.

It is defined to require that all lanes of the scalable vector are demanded.

Also, here you're only changing the meaning of DemandedElts for one function, but there are lots of functions scattered throughout codegen and IR that have similar interfaces that take a DemandedElts. So there is now an inconsistency and is potentially confusing for users as to what it means. A quick grep of llvm/include/llvm showed around 65 declared functions that take a parameter of exactly the same name.

Like I said, I have patches for ComputeNumSignBits and SimplifyDemandedBits. This covers almost all of the uses of DemandedElts inside SDAG. There are other IR level functions, but they are separate.

SelectionDAG::computeKnownBits is called by SelectionDAG::MaskedValueIsZero, SelectionDAG::MaskedVectorIsZero, TargetLowering::SimplifyMultipleUseDemandedBits, etc, which also take a DemandedElts parameter and pass it straight on to computeKnownBits. However, the comments on those interfaces haven't been updated to explain they now work for scalable vectors.

MaskedVectorIsZero and MaskedValueIsZero with DemandedElts are only actually called from the X86 backend. The others I have patches for, and haven't found any problems with them.

Rather than go to the trouble of updating the comments on all those functions, I personally think it's easier to just create a new simple wrapper class to represent demanded elements in a similar way to what we did for ElementCount and TypeSize.

A DemandedElts will not really be like ElementCount or TypeSize. It is not one thing (a count or type) optionally made scalable. It is a bitmask for each lane in a possibly effectively-infinite vector. It can't just be specified with a Bitmask that gets repeated. As far as I can tell, adding that might only be useful for BITCAST.

The ZERO/SIGN_EXTEND_VECTOR_INREG don't seem to be used for scalable vectors. Which really leaves EXTRACT_SUBVECTOR, EXTRACT_VECTOR_ELT and INSERT_SUBVECTOR/INSERT_VECTOR_ELEMENT, which are specific to the lane. They would probably need to be represented as "all lanes are 0 except these" or "all lanes are 1 except these", which are both useful for different places.

So I think DemandedElts would need to be a repeated pattern with some way to represent lanes that deviate from it, that needs to grow as far as needed for any constant lane indices. Plus all the logic needed to make the anding and oring and extracting work, but I've not looked into seeing how to implement them yet. I'm not against making something like that, but it needs to be added for a reason. Just handling BITCAST and the EXTRACT/INSERTS from scalable vectors doesn't feel like a huge reason to me compared to what else these functions provide.

dmgreen updated this revision to Diff 441438.Jun 30 2022, 9:42 AM

A little bit of rewording.

dmgreen added inline comments.Jun 30 2022, 9:42 AM
llvm/lib/CodeGen/SelectionDAG/SelectionDAG.cpp
2906–2907

I was keeping the assert with the existing one for the length of DemandedElts. I can move it or both of them if you think that's best. (But I don't think that ConstantSDNode/ConstantFPSDNode can be scalable, and the MaxRecursionDepth didn't seem important for whether we assert or not). Let me know what you think. Happy to move it.

Reading through the comments, has anything changed since my first comment? The main response seems to be "but hay it works". I'm sure that's true, but we've had several such moments when adding scalable support and each time we've been told to implement things properly. Hence the new vector type, element count, type size and instruction cost. I'm not hearing good reasons why demanded elements is any different.

To call out specific cases I'm worried about:

  1. Target's that implement fixed length operations via the scalable types. It seems reasonable to be able to track the real elements within their container.
  2. Ensuring operations on <vscale x 1 x types do not get misrepresented as being scalar like.

Whilst I'm sure it's possible to show these changes function correctly, it's my belief that such validation is based on either today's limited implementation or plain luck, rather than the design being solid.

Perhaps others have a different view? To be honest this is an area that I've largely ignored so I'll need to dig deeper but I just don't see a safe implementation without either a new type or new set of functions that are designed specifically for the limited use case you want to handle.

reames added a subscriber: reames.Nov 18 2022, 8:35 AM

FYI, D137140 so this patch is now mostly redundant.

I went back to look at the two parts I left out. From the original commit message that was:
"Adds some simple known bits for ISD::STEP_VECTOR, which for power-two increments we know lower bit zeroes."
"The patterns for INDEX were made to work with adds or add-like-ors, to keep them working as intended."

I applied both of these changes locally, and saw no test difference. Either individually or together. Given that, I'm not going to push those further until we have a motivating example or a test in tree. Feel free to split into it's own patch - with appropriate test - if desired. I'd be happy to review.

reames requested changes to this revision.Nov 18 2022, 8:36 AM
This revision now requires changes to proceed.Nov 18 2022, 8:36 AM
dmgreen abandoned this revision.Dec 8 2022, 4:33 AM

OK. There might be more to do, but that can be done in other reviews. Closing.