Page MenuHomePhabricator

Allow scalable vectors in computeKnownBits
ClosedPublic

Authored by reames on Oct 21 2022, 10:48 AM.

Details

Summary

This extends the computeKnownBits analysis to support scalable vectors. The critical detail is in deciding how to represent the demanded elements of a vector whose length is unknown at compile time.

For this patch, I adopt the convention that we track one bit which corresponds to all lanes. That is, that bit is implicitly broadcast to all lanes of the scalable vector resulting in all lanes being demanded. This is the same convention we use in getSplatValue in SelectionDAG.

Note that this convention doesn't actually impact much. Most of the code is agnostic to the interpretation of the demanded elements, and the few cases which actually care need case by case handling anyways. In this patch, I just bail out of those cases.

A prior patch (D128159) proposed using a different convention in SDAG. I don't see any strong reason to prefer one scheme over the other, so I propose we go with this one as it's conceptually the simplest. Getting known and demanded bit optimizations unblocked at all is a significant win.

I've locally implemented this scheme in reasonable large parts of ValueTracking.cpp and SelectionDAG equivalents, and have not hit any blockers. If this is approved, I plan to post a series of patches plumbing this through all the relevant parts.

In the discussion on that patch, a preference was expressed for introducing some form of abstraction around the demanded elements. I'll note that I've played with several variations on that idea locally, and have yet to find anything which results in more readable code. If anyone has concrete ideas in this area, I'm happy to explore in follow up patches. I'd strongly prefer to be making API changes in NFC manner with tests in place.

Diff Detail

Event Timeline

reames created this revision.Oct 21 2022, 10:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 10:48 AM
reames requested review of this revision.Oct 21 2022, 10:48 AM
Herald added a project: Restricted Project. · View Herald TranscriptOct 21 2022, 10:48 AM
efriedma added inline comments.Oct 24 2022, 11:47 AM
llvm/lib/Analysis/ValueTracking.cpp
1840

I guess this is a more general thing, but does it make sense to analyze insertelement even if we can't figure out which element is being inserted?

1870

Duplicate ScalableVectorType check?

Does it make sense to analyze the vector operand without trying to figure out which elt is demanded?

reames added inline comments.Oct 24 2022, 1:24 PM
llvm/lib/Analysis/ValueTracking.cpp
1840

I'd had this same thought, and in fact was working on a patch for that locally. This turns out to be a bit trickier than it seems as the unknown index can be out of bounds. As a result, the resulting vector could be entirely poison. In demanded bits, we seem to try not inferring bits of potentially undef values, and I think poison needs treated the same.

1870

Yep, looks like it's redundant, will remove.

Same point as above about poison/undef.

efriedma added inline comments.Oct 24 2022, 1:31 PM
llvm/lib/Analysis/ValueTracking.cpp
1840

Inferring bits in a poison value shouldn't be an issue. Since it's poison, the entire value is meaningless. (If it didn't work this way, we wouldn't be able to compute known bits at all without proving a value isn't poison.)

The issue with undef is just that we can't infer 1 or 0 for a bit we know is undef, because it could be both. (This sort of difficult reasoning is one of the reasons we're trying to move away from undef...)

reames updated this revision to Diff 470277.Oct 24 2022, 1:33 PM

Address review comment

efriedma accepted this revision.Oct 24 2022, 1:36 PM

LGTM, but give it a couple days to see if there are other comments on the general approach.

This revision is now accepted and ready to land.Oct 24 2022, 1:36 PM

@reames Can I be rude and ask you to wait until Friday? I'm a bit swamped at the minute and would love a little more time to properly digest the idea before it lands.

@reames Can I be rude and ask you to wait until Friday? I'm a bit swamped at the minute and would love a little more time to properly digest the idea before it lands.

Not a problem.

paulwalker-arm accepted this revision.Oct 29 2022, 4:10 AM
paulwalker-arm added a subscriber: dmgreen.

Thanks for waiting @reames, I cannot say I'm truly in love with the approach but it is the best proposal so far and least likely to bit us later. One of my unfounded reservations of previous proposals was when transition between fixed length and scalable vectors. With this approach ensuring there's only ever one representation for all scalable vectors it should at least be easier to spot invalid code paths.

One day I see value in having a more powerful representation. My current thinking is to have an extra bit whereby:

0 -> duplicates the state of the final known lane across all unknown lanes [ACBDDDDDDDDDDDDD]
1 -> duplicates all known lanes across across all unknown lanes in "known lanes" sized blocks. [ABCDABCDABCDABCD]

Having such a representation we can model scalar and subvector inserts/extracts as well as help with cases where the even an odd lanes are constructed via predicated logic.

I've not bug down on this so much though and as you say the important thing is getting some level of support and splat handling is likely to give us some nice early wins. It'll also make @dmgreen's day, which is a bonus.

Again, thanks for allowing me pondering time :)

This revision was landed with ongoing or failed builds.Oct 30 2022, 9:00 AM
This revision was automatically updated to reflect the committed changes.

For the record, let me sketch out where I think this might be going long term.

For scalable vectors, we have a couple of idiomatic patterns for representing demanded elements.

The first is a splat - which this patch nicely handles by letting us do lane independent reasoning on scalable vectors. This covers a majority of the cases I've noticed so far, and is thus highly useful to have in tree as we figure out next steps.

The second is sub_vector insert/extract. This comes up naturally in SDAG due to the way we lower fixed length vectors on RISCV (and, I think, ARM SVE.) This requires tracking a prefix of the demanded bits corresponding to the fixed vector size, and then a single bit smeared across remaining (unknown number of) lanes.

We could pick the prefix length in one of two ways:

  1. From the fixed vector being inserted or extracted.
  2. From the minimum known vector register size. This is more natural in DAG; at the IR layer, this requires combining the minimum vector length of a type which the minimum vscale_range value.

The third is scalar insert/extract. For indices under the minimum vector size, this reduces the former case. I don't yet know how common various runtime indices we can't prove in bounds are. One example we might see is the "end of vector - 1" pattern which comes e.g. from loop vectorization exit values. There may also be others. I don't yet really have a good sense here.

The fourth is generalized shuffle indices. (i.e. figuring out what lanes are demanded from a runtime shuffle mask) We're several steps from being able to talk about this concretely, and I'm not yet convinced we'll need anything here at all. If we do need to go here, this adds a huge amount of complexity. I'm hoping we don't get here.

I'm pretty sure we'll need to generalize at least as far as subvector insert/extract. I'm not sure about going beyond that yet.

@paulwalker-arm What's the motivation for your ABCDABCDABCDABCD pattern? That looks like a splat of a larger than element type value. What idioms does this come from?