Given an insert of a scalar load into a vector shuffle with mask u,0,1,2,3,4,5,6 or 1,2,3,4,5,6,7,u (depending on the insert index), it can be more profitable to convert to a single load and avoid the shuffles. This adds a DAG combine for it, providing the new load is still fast.
Details
Diff Detail
- Repository
- rG LLVM Github Monorepo
Event Timeline
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
20989 | auto *ScalarLoad = dyn_cast<LoadSDNode>(Scalar); if (!ScalarLoad) return SDValue(); | |
21000 | auto *VecLoad = dyn_cast<LoadSDNode>(Vec); if (!VecLoad || Vec.getValueType().getScalarType() != Scalar.getValueType()) return SDValue(); | |
21018 | You might be able to use DAG.areNonVolatileConsecutiveLoads for some/all of these checks? | |
21038 | Is all this correct for big-endian? |
Thanks for taking a look. Update to use areNonVolatileConsecutiveLoads. areNonVolatileConsecutiveLoads will check the size of Base is Bytes, so this passes -1 for the Dist.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
21038 | I believe so. I originally had this disabled for bigendian, but as far as I can tell it is OK as we are always dealing with elements of the same size, which will be in the same positions between big and little endian. Let me know if that doesn't sound right. |
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
20972 | I think we could allow undef mask elements as well, but we'd have to ensure that we could still deference the entire vector width? |
Handle undef lanes and add a couple of extra tests for them.
llvm/lib/CodeGen/SelectionDAG/DAGCombiner.cpp | ||
---|---|---|
20972 | Sounds good. I think, as we have the original full load, we can dereference the whole vector even if the elements are the shuffle are undef (if I've understanding you correctly). We might have a "less undefined" output though, which knows less lanes are undef/poison. |
I think we could allow undef mask elements as well, but we'd have to ensure that we could still deference the entire vector width?