This is an archive of the discontinued LLVM Phabricator instance.

[SystemZ] Use vllez + vle to load, shuffle and zero extend a vector.
AbandonedPublic

Authored by jonpa on Mar 17 2020, 4:13 AM.

Details

Reviewers
uweigand
Summary

(Separated and continued from https://reviews.llvm.org/D75978)

I agree that using VLLEZ to implement the zero-extend is a good idea. I'm wondering about the implementation however; it seems odd to combine two quite different manners of emitting that instruction.

First of all, the "canonicalization in DAGCombiner::visitINSERT_VECTOR_ELT()" problem you mention seems to be a problem in general; the code with the two element-inserts could easily arise just from normal middle-end action (not just that new back-end code you added), and we really ought to be able to handle such code in general. So I think we should try to do that, either by overriding the DAGCombine with our own rule (if that's possible), or possibly handling this as a SystemZDAGToDAGISel::Select rule? In any case, I'm sceptical about the new DAG opcode -- it's always better to avoid those for semantics that *can* be described with standard opcodes, since custom opcodes are opaque to all further analysis by the middle-end ...

The other question is how to handle the zero-extend expansion. Logically, a vector zero-extend is fully equivalent to a shuffle intermixing bytes from the original vector with a zero vector. So I think the back-end ought to handle both cases in the same way. Now, we have logic that will intelligently handle shuffles (the whole GeneralShuffle logic). From what I can see, this never emits an UNPACK -- however, it should in theory be able to emit a MERGE, which would later (if one of the inputs is a zero vector) be transformed into an UNPACK via SystemZTargetLowering::combineMERGE.

Sorry for the confusion, but it seems I may have been doing something wrong earlier with the INSERT_VECTOR_ELT nodes, since now that I removed the VLLEZ node it actually seems to work to just use INSERT_VECTOR_ELT nodes the way I first tried. I was probably also too eager to introduce a new SystemZISD node thinking that it was the right thing to (since I wanted the common-code *not* to change anything after that point).

The advantage of a VLLEZ node would be that later optimizations do not confuse things so that two VLE's and a VGBM 0 would instead result. I saw this now in a reduced test case, but this does not seem to be a problem on the benchmarks (never happens). The only difference there is that without the VLLEZ node, one file gets dsgfr:s instead of dsgr:s - in other words it can figure out that the extracted element is 32 bits or less. So it seems to be an advantage, per what you said above.

Is this patch acceptable now? I haven't checked further if the current code misses any cases of VLLEZ, but at the moment I have no reason to believe it does.

Diff Detail

Event Timeline

jonpa created this revision.Mar 17 2020, 4:13 AM
jonpa abandoned this revision.Jun 29 2020, 11:54 PM
Herald added a project: Restricted Project. · View Herald TranscriptJun 29 2020, 11:54 PM