This is an archive of the discontinued LLVM Phabricator instance.

[DAG] Combine insert(shuffle(load), load, 0) into a single load
ClosedPublic

Authored by dmgreen on May 20 2023, 5:58 AM.

Details

Summary

Given an insert of a scalar load into a vector shuffle with mask u,0,1,2,3,4,5,6 or 1,2,3,4,5,6,7,u (depending on the insert index), it can be more profitable to convert to a single load and avoid the shuffles. This adds a DAG combine for it, providing the new load is still fast.

Diff Detail

Event Timeline

dmgreen created this revision.May 20 2023, 5:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2023, 5:58 AM
dmgreen requested review of this revision.May 20 2023, 5:58 AM
Herald added a project: Restricted Project. · View Herald TranscriptMay 20 2023, 5:58 AM
RKSimon added inline comments.May 20 2023, 9:52 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20989
auto *ScalarLoad = dyn_cast<LoadSDNode>(Scalar);
if (!ScalarLoad)
  return SDValue();
21000
auto *VecLoad = dyn_cast<LoadSDNode>(Vec);
if (!VecLoad ||
      Vec.getValueType().getScalarType() != Scalar.getValueType())
    return SDValue();
21018

You might be able to use DAG.areNonVolatileConsecutiveLoads for some/all of these checks?

21038

Is all this correct for big-endian?

dmgreen updated this revision to Diff 524178.May 22 2023, 12:36 AM

Thanks for taking a look. Update to use areNonVolatileConsecutiveLoads. areNonVolatileConsecutiveLoads will check the size of Base is Bytes, so this passes -1 for the Dist.

dmgreen added inline comments.May 22 2023, 12:38 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
21038

I believe so. I originally had this disabled for bigendian, but as far as I can tell it is OK as we are always dealing with elements of the same size, which will be in the same positions between big and little endian. Let me know if that doesn't sound right.

RKSimon added inline comments.May 23 2023, 9:11 AM
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20972

I think we could allow undef mask elements as well, but we'd have to ensure that we could still deference the entire vector width?

dmgreen updated this revision to Diff 524784.May 23 2023, 10:08 AM

Handle undef lanes and add a couple of extra tests for them.

llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp
20972

Sounds good. I think, as we have the original full load, we can dereference the whole vector even if the elements are the shuffle are undef (if I've understanding you correctly). We might have a "less undefined" output though, which knows less lanes are undef/poison.

This revision is now accepted and ready to land.May 30 2023, 10:26 AM
This revision was landed with ongoing or failed builds.May 31 2023, 11:49 AM
This revision was automatically updated to reflect the committed changes.