This is an archive of the discontinued LLVM Phabricator instance.

[SVE][Builtins] Add metadata to intrinsic calls for builtins that don't define the result of inactive lanes.
AbandonedPublic

Authored by paulwalker-arm on Jan 8 2023, 4:39 PM.

Details

Summary

The ACLE for SVE define a repeating set of builtins that allow the
result of inactive lanes to be zeroed (Z), copied from an input
operand (M) or have an undefined value (X). When lowering these
builtins we lose the semantics of the undefined variants because to
keep the intrinsic count down we chose to treat them as M forms.

This largely makes sense because in the majority of instances only
the M form is backed by a real instruction. This does mean we miss
out on some optimisation opportunities and so this patch introduces
metadata to the intrinsic calls that allow us to represent the cases
where an M form can be considered to be an X form. This metadata is
freely ignorable because copying the inactive lanes from an input
operand is a valid option to represent an undefined value, and
matches the behaviour before this patch.

To demonstrate the metadata's usage this patch includes a trivial
optimisation so that svadd_x emits the unpredicated variant of ADD
as expected.

NOTE: I did investigate representing the undefined lanes using a select on the governing predicate but this proved a poor design because optimisations became order sensitive, the extra IR made use count protection harder to handle and the select instruction itself has strict rules relating to poison that hampered the intent of this change.
NOTE: All the existing tests pass without regeneration and so to keep the reviewed patch small I only regenerated one of the tests to show the affect. If agreeable I'll regenerate all the other tests just before landing the patch.

Depends on D141056

Diff Detail

Event Timeline

paulwalker-arm created this revision.Jan 8 2023, 4:39 PM
Herald added a project: Restricted Project. · View Herald Transcript
paulwalker-arm requested review of this revision.Jan 8 2023, 4:39 PM
Herald added projects: Restricted Project, Restricted Project. · View Herald TranscriptJan 8 2023, 4:39 PM

Using metadata seems sensible, but did you also identify any downsides? I could imagine that we'd need to manually propagate metadata to any nodes after we do a combine (which can't be blindly copied?), e.g. add + mul -> mla, this new intrinsic would also need the metadata.

For intrinsics that don't have a directly corresponding (unpredicated) LLVM IR instruction, is there still a way to use this information in SelectionDAG?

the select instruction itself has strict rules relating to poison that hampered the intent of this change

For my understanding, can you elaborate what these strict rules regarding poison are that hamper such a change, and what it was that you tried?

Matt added a subscriber: Matt.Jan 9 2023, 3:15 PM

Using metadata seems sensible, but did you also identify any downsides? I could imagine that we'd need to manually propagate metadata to any nodes after we do a combine (which can't be blindly copied?), e.g. add + mul -> mla, this new intrinsic would also need the metadata.

I don't really see manually propagation as a downside because it's not functionally required but rather advantageous to maximise optimisation opportunities. The downside is the opposite in that any transformation that wants to rely on the inactive lanes being defined as before this patch will now need to check for the presence of (or rather lack of) the new metadata before blindly reusing the result of an existing SVE intrinsic call. The transformation can still reuse the call it must just first discard the metadata.

For intrinsics that don't have a directly corresponding (unpredicated) LLVM IR instruction, is there still a way to use this information in SelectionDAG?

Truth be told I'm not entire sure how this will play out. I'm not sure whether it's better to use the information within the IR as I'm doing in this patch or whether this should be used solely when lowering IR to DAG. So it's really an experiment to see what sticks while proving a route to fix some of the issues we've already observed with how we represent the X forms.

Predicated->unpredicted aside another use for encoding undefiness is that it helps with things the FMAs where we can use FMAD if that better suits register allocation much like we do for stock IR.

the select instruction itself has strict rules relating to poison that hampered the intent of this change

For my understanding, can you elaborate what these strict rules regarding poison are that hamper such a change, and what it was that you tried?

The LangRef states the transformation "select P, A, undef ==> A" is only valid when you can prove the inactive lanes of "A" do not contain poison. I'm unsure if this is a true blocker or a mere inconvenience because to maintain the maximum amount of information we likely don't want to remove the selects anyway. I went down this path by creating an SVE undef intrinsic, which nothing knows about and thus will be left alone. The problem is that it massively polluted the IR and I was worried it'll make it harder to spot/implement the typical combines. For sure the existing combines will need to be changed because they'll not know to look through the new selects.

There is the option to change the clang builtin lowering to provide finer control over which builtins emit these selects, but that just means more changes (updates to existing instcombines) each time we decide a builtin is worth the extra select.

I'll keep experimenting but as I mention within the in code comment, the likely best solution is to have dedicate intrinsics with this being the least intrusive hack.

Perhaps the key word there is "hack" :) I'll investigate the dedicate intrinsics route because perhaps we only require a handful to get the majority of the benefit.

paulwalker-arm planned changes to this revision.Jan 11 2023, 5:30 PM

Just a heads up that I'm likely to abandon this patch because as predicted implementing dedicated intrinsics is looking like the better design and most all the code generation plumbing is already present and so even the implementation is minimal.

paulwalker-arm abandoned this revision.Feb 6 2023, 6:55 AM

D141939 turned out to be the better approach.