This is an archive of the discontinued LLVM Phabricator instance.

[PowerPC] Fix vector_shuffle combines when inputs are scalar_to_vector of differing types.
Needs RevisionPublic

Authored by amyk on Jul 25 2022, 7:23 AM.

Details

Reviewers
power-llvm-team
nemanjai
stefanp
kamaub
Group Reviewers
Restricted Project
Summary

This patch fixes the combines for vector_shuffles when either or both of its
left and right hand side inputs are scalar_to_vector nodes.

Previously, when both left and right side inputs are scalar_to_vector nodes,
the current combine could not handle this situation, as the shuffle mask was
updated incorrectly. https://reviews.llvm.org/D127818 was a temporary solution
to this issue. Now, not only does this patch aim to resolve the previous issue the
of incorrect shuffle mask adjustments respectively, it also updates any test cases
that are affected by this change.

Diff Detail

Event Timeline

amyk created this revision.Jul 25 2022, 7:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 7:23 AM
amyk requested review of this revision.Jul 25 2022, 7:23 AM
Herald added a project: Restricted Project. · View Herald TranscriptJul 25 2022, 7:23 AM
amyk updated this revision to Diff 452354.Aug 12 2022, 7:00 PM

Rebase patch based on new tests added within D130485.

Can you also comment on whether this was thoroughly tested on both little endian and big endian systems (bootstrap, test-suite, SPEC, additional internal tests).

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14917–14918

I don't follow why we need these here. They both seem to only be needed in the respective conditions (i.e. depending on whether the LHS/RHS are scalar_to_vector nodes). And within those conditional blocks, they are reset before they're used.

So why do we need to define them here and initialize them to the width of a vector?

llvm/test/CodeGen/PowerPC/v16i8_scalar_to_vector_shuffle.ll
267

The code for this one gets worse on big endian. Do we know why?

347

The code for this one gets worse on big endian. Do we know why?

578

The code for this one gets worse on big endian. Do we know why?

659

The code for this one gets worse on big endian. Do we know why?

1431

The code for this one gets worse on big endian. Do we know why?

1658

The code for this one gets worse on big endian. Do we know why?

There are probably a bunch of other places. Can you please review what is happening there? I'll stop adding further similar comments.

llvm/test/CodeGen/PowerPC/v4i32_scalar_to_vector_shuffle.ll
123 ↗(On Diff #452354)

The code generated for this one gets worse on all subtargets. Do we know why?

amyk updated this revision to Diff 479927.Dec 4 2022, 9:15 AM

Discussed this patch with Nemanja outside of the review.

Many of the cases where we get worse codegen on BE (and some on LE as well) within the test cases occur either when the shuffle mask is explicitly asking for undefined vector elements from the original scalar_to_vector, or when the values in the vector are partially defined in the cases where the scalar_to_vector element size is smaller than the vector_shuffle element size. These cases primarily seem to appear within these contrived test cases, rather than real code. As a result, I've rebased this patch and addressed some review comments to move forward with this patch.

amyk added a comment.Dec 4 2022, 9:18 AM

@nemanjai Yeah, at the time of posting the patch, I did do little endian and big endian system tests as you mentioned. Since it's been awhile and I need update the patch, I'd like to retest these particular runs with this patch.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14917–14918

Ya, I think I had that in the beginning and meant to remove it prior to putting up the patch but didn't realize that it was still left in there. I don't believe I need it either, so I've removed it.

nemanjai accepted this revision.Feb 24 2023, 4:36 AM

Although I have a fair number of comments, they're mostly stylistic comments that probably don't really require another revision. So LGTM and please address the comments prior to committing.

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14847

I don't think we modify ShuffV so it should be a const reference.

14848–14849

These two values are actually the last *element* rather than *byte* aren't they? If so, please rename accordingly.

14855

Why do we only check here if LHSLastByteDefined >= 0 and similarly for the RHS below? Do we really want to pretend that a shuffle mask is in range if the last byte/element defined is undefined (i.e. presumably no bytes are defined)?

14903–14904

The name LaneWidth is misleading here. I kept thinking the width (i.e. the number of bits) of the lane that contains a defined value. But it is actually the number of valid elements in the vector.
For a node:

(shuff (v4i32 s_to_v i32), arbitrary_v4i32), mask)
LHSValidLaneWidth = 1
RHSValidLaneWidth = 4

And for a node:

(shuff (v4i32 s_to_v i32), (bitcast (s_to_v i64), v4i32), mask)
LHSValidLaneWidth = 1
RHSValidLaneWidth = 2

If I'm interpreting it correctly, please rename them to something like NumValidElts.

14932

I think we should have an early exit here if the valid lane width is zero:

if (LHSValidLaneWidth == 0)
  return false;

Since it is not really reasonable to do this transform if we are pulling in more bits than the original scalar_to_vector actually defined. Similarly with the RHS below.

14933

Nit: maybe a comment to make this clearer:

// The last element that comes from the LHS. For example:
// (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...)
// The last element that comes from the LHS is actually 0, not 3
// because elements 1 and higher of a scalar_to_vector are undefined.
14945

Similar nit as above. A comment along the lines of:

// The last element that comes from the RHS. For example:
// (shuff (s_to_v i32), (bitcast (s_to_v i64), v4i32), ...)
// The last element that comes from the RHS is actually 5, not 7
// because elements 1 and higher of a scalar_to_vector are undefined.
// It is also not 4 because the original scalar_to_vector is wider and
// actually contains two i32 elements.
This revision is now accepted and ready to land.Feb 24 2023, 4:36 AM
kamaub added a subscriber: kamaub.Feb 27 2023, 10:27 AM
kamaub added inline comments.
llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14932

move LHSScalarSize up and use it in this block

14944

Same here, please raise RHSScalarSize and reuse it.

kamaub requested changes to this revision.Feb 28 2023, 9:49 AM

Request changes because of the bug in the isShuffleMaskInRange() conditions

llvm/lib/Target/PowerPC/PPCISelLowering.cpp
14850

please use a range based loop

14855–14856
14856

Please change this to > (and for below).

14860–14861
14907–14908

please expand this comment to point out that -1 means all elements are defined

This revision now requires changes to proceed.Feb 28 2023, 9:49 AM