This is an archive of the discontinued LLVM Phabricator instance.

[AArch64][SVE] POC: Use predicate registers for <N x i1> expression trees.
Changes PlannedPublic

Authored by sdesmalen on Jan 18 2022, 9:30 AM.

Details

Reviewers
efriedma
Summary

By default fixed-width i1 vectors are promoted, but when SVE is available,
some expression trees can be rewritten to use <vscale x M x i1> types,
such that all operations are performed on predicate registers, thus
avoiding unnecessary sign-extends and truncates.

The example chosen in this patch is to optimise an OR reduction
of a <N x i1> type, which can be implemented directly with a PTEST
instruction.

Note: this patch also contains a few other improvements that can be
split out into individual patches.

Diff Detail

Event Timeline

sdesmalen created this revision.Jan 18 2022, 9:30 AM
sdesmalen requested review of this revision.Jan 18 2022, 9:30 AM
Herald added a project: Restricted Project. · View Herald TranscriptJan 18 2022, 9:30 AM
efriedma added a subscriber: efriedma.

The other possible approach I can think of is to reconsider the way legalization works for i1 vectors. This transform is basically reversing work done by type legalization: the legalizer promotes i1 vectors because they aren't legal. We could, instead, use some sort of custom legalization for i1 vectors: instead of promoting the element type, convert them directly to scalable vectors. Probably more work to implement initially. But it might be easier to reason about the profitability if we avoid generating sign-extensions that shouldn't exist in the first place.

Which approach is better depends on how complex propagatePredicateTy gets, I guess. If we just have 100 lines of code to reverse sign-extensions, fine; if we end up with 1000 lines, probably we should reconsider the approach.

Matt added a subscriber: Matt.Jan 25 2022, 3:16 PM

The other possible approach I can think of is to reconsider the way legalization works for i1 vectors. This transform is basically reversing work done by type legalization: the legalizer promotes i1 vectors because they aren't legal. We could, instead, use some sort of custom legalization for i1 vectors: instead of promoting the element type, convert them directly to scalable vectors. Probably more work to implement initially. But it might be easier to reason about the profitability if we avoid generating sign-extensions that shouldn't exist in the first place.

Which approach is better depends on how complex propagatePredicateTy gets, I guess. If we just have 100 lines of code to reverse sign-extensions, fine; if we end up with 1000 lines, probably we should reconsider the approach.

My understanding is that decisions have been made for NEON on how to represent fixed-width vectors of i1's (i.e. through promotion) and we're kind of bound to those choices going forward. This avoids mixing the two representations (or better: their definition of whether they are illegal/legal types) based on the amount of elements in the vector or on where the vectors are used. It seems doable to undo the type legalisation for certain cases, such as vecreduce_or. From what I've seen so far, I expect we'll need to support only a handful of cases to bubble up the sign-extend + extract_subvector.

I've simplified the approach and put up a new patch here: D119346

sdesmalen planned changes to this revision.Feb 9 2022, 8:31 AM

For NEON, we're obviously forced to promote, sure. And that means even if we have SVE, we're forced to promote across call boundaries. That doesn't necessarily constrain what we do within a function; there's space to prefer SVE operations more aggressively. But it makes things more complicated, sure. In particular, we probably don't want to try to deal with all the side-effects if we try to mark the types "legal".

I think there are still potential alternatives to consider, though. Maybe instead of actually using the type legalization machinery, we could DAGCombine operations involving fixed-width predicates before type legalization. Or we can use custom lowering to generate some sequence that eventually produces a value of the right type, but is easier to analyze. There's some space to explore here for representations of fixed-width SETCC that aren't just SETCC with a promoted result type.

Oh, hmm, that's basically what you're doing in D119346. Okay. :)